Losing language features: some stories about disjoint unions

https://graydon2.dreamwidth.org/318788.html

119•Bogdanp•6mo ago

Comments

ivanjermakov•6mo ago

I'm surprised how many modern languages lack first-class sum type support, considering the amount of domain use cases for them.

js8•6mo ago

Not only programming languages, SQL as well. Sum types are also kinda awkward to represent in relational algebra (it can be done similar to inheritance, your type constructor will be the ancestor relation and for each data constructor you have a descendant relation - this is because of the categorical duality between sum and product types, foreign key constraints are arrows).

MarkusQ•6mo ago

Back in the day, when memory wasn't as cheap as it is now, there was a strong belief that forcing the user to "waste bits" on a proper sum type was a non-starter for a "real" language. It was widely assumed that the reason you were "sharing memory" between two fields was to conserve space, because you were clever enough to have recognized that they couldn't both be used at the same time. But doing so, it was generally assumed, meant that you were space constrained and so anything that took away your precious savings was bad.

I'm not saying this was "right" in any sense, but it wasn't just foolish old timers not recognizing that a "better" solution was possible. When you grew up having every single bit of memory threaded by hand (and costing macroscopic amounts of money), you think about memory efficiency differently.

cbsmith•6mo ago

Yeah, I remember using untagged unions because the code "knew" which type was the right one to use at the right time and you wanted to save space (and wanted your structures to fit well in registers --and later cache lines).

setr•6mo ago

I suppose that’s what cobol copybooks are; untagged unions, everywhere, all the time, all at once

NooneAtAll3•6mo ago

looking at VSCode and browsers eating RAM as if it's nothing makes me think modern approach isn't that good either

eru•6mo ago

Maybe, but it's not sumtypes that are to blame.

adrian_b•6mo ago

This kind of unsafe union has already been present in the first FORTRAN version from 1956, as the statement EQUIVALENCE.

Presumably this was inspired by existing practices in the assembly-language programs.

Many early programming languages have followed the example of FORTRAN.

In October 1964, John McCarthy (which had not only been leading the development of the first LISP, but he had also been a major contributor to ALGOL) has proposed the keyword "union" and he had advocated for safer implementations of disjoint union types. The first such implementation has been in ALGOL 68, a few years later.

Unfortunately, the C language has taken the "union" keyword from ALGOL 68, but instead of implementing proper disjoint union types, the C "union" has been made essentially identical with the FORTRAN EQUIVALENCE from 1956.

The "variant record" of Pascal, was not really better, so that can also be counted as a failure to implement proper union types.

For a long time Pascal and C have been among the most popular programming languages, creating bad habits in the use of unions.

pklausler•6mo ago

`EQUIVALENCE` was (and is) a storage overlay of variables. It's not part of the type system, and one cannot define a derived type in standard Fortran that contains components with overlaid storage.

adrian_b•6mo ago

While that is right, what you could do with EQUIVALENCE was pretty much equivalent with what you can do with "union" in C. For some things FORTRAN was more verbose, for other things C is more verbose.

Accessing a value was more laconic in FORTRAN, because you omitted the union name and the "." that are required in C.

The fact that a union is considered a type in C allows the use of a typedef, which removes the need to repeat the definitions for multiple variables, but that is about the only help that you get in C.

Both in C and in FORTRAN you must know the current type of the stored value and there is no protection against using the wrong type. If the current type cannot be determined statically, you must manage manually another variable storing a type tag, in both languages.

pklausler•6mo ago

EQUIVALENCE is useful even when the types are identical.

MarkusQ•6mo ago

Modern type systems grew out of the problems caused by such storage hacks (and the general issues of having a bunch of bits that have to be consistently interpreted in some way). Kvetching about the type system in FORTRAN is like complaining about the Wright Flyer's lack of overhead storage bins.

nyrikki•6mo ago

> Unfortunately, the C language has taken the "union" keyword from ALGOL 68, but instead of implementing proper disjoint union types, the C "union" has been made essentially identical with the FORTRAN EQUIVALENCE from 1956.

Many word address machines, explicitly the PDP-7, it was only the instructions that changed, with ADD being one's complement and YAD being two's

Remember we only got B/C because the PDP-7 didn't have enough ram to implement FORTRAN.

Similar reasons C case switch falls through, the PDP-7 SAD instruction forced it. Then they abused that fall through to support lower case on the PDP-11, which would have been powerful enough for the type of union you are talking about.

Midas/Macro assembly is tangential related but it is really a side effect of word addressable, accumulator/index machines.

IIRC, Lisp is a good example for the difference between equality by value, reference, or even predicated equality.

If you want to think about just how limited the PDP-7 was look at the instructions

www.bitsavers.org/pdf/dec/pdp7/PDP7_Instruction_list.pdf

rwmj•6mo ago

struct page in Linux is this taken to its logical conclusion.

https://github.com/torvalds/linux/blob/89be9a83ccf1f88522317...

Edit: A bit of background: https://lwn.net/Articles/565097/

tux3•6mo ago

Fun fact: With the advent of folios, struct page is starting to undergo some major changes, with the eventual goal of shrinking it into a single 64bit number.

But not to worry, the unreadable mess of C unions is not going away. struct folio will eventually absorb all those fields, and more. The only difference is there's a single folio for a whole set of pages, so moving the data there will save a significant amount of memory.

https://github.com/torvalds/linux/blob/89be9a83ccf1f88522317...

jchw•6mo ago

Lack of sum types is probably one of the worst things about working in Go, and I think it is a much bigger problem than the lack of generics ever was. Sadly, though, I don't think you can really just bolt sum types onto an already complete programming language design.

andrewflnr•6mo ago

I wonder if a version of OCaml that had a better concurrency story 10-15 years ago could have taken Go's place.

jchw•6mo ago

Maybe, but I think focusing on concurrency is the wrong idea.

I think Go + sum types could be good. Maybe. But, honestly, it's hard to say for sure. First-order effects are great: we have sum types and can use them to model problems. Second-order effects get muddy: We have sum types and they are the ideal solution for a bunch of problems, but without other features they're awkward to use. For example... now you can do a Result type... but if you want to return multiple values in a Result, you need a set/tuple type too. If you do that, is Go's multiple return values concept still a good idea? I could probably go on but hopefully this illustrates the point.

I think a lot of people don't acknowledge why Go is truly so successful. Well OK, first elephant in the room, it's obviously successful because it's backed by Google, a company who can and did throw immense resources at making the implementation and standard library pretty damn good as well as helping to push it, but that alone wouldn't have propelled it to where it is today (I mean, compare it to Dart.)

But beyond that, Go is successful because Go code is very simple to write and leaves the programmer with relatively few things to be concerned about, and yet in spite of that, Go code generally runs reasonably fast, uses relatively small amounts of memory and is generally not very crash-prone. (Some people would happily debate that, but I trust software like restic, esbuild, rclone and Syncthing every day without fail, among other smaller Go programs. I'm OK with making that assertion.)

If you put in the effort to make really good Rust code, the effort is not wasted, but it is a lot of effort when the stupid Go code often does the trick. Consider Discord's presence service: they switched from Go to Rust and massively cut costs and improved latency. Rust wins, Rust better? But... most services will never scale to the point where marginal improvements in latency, throughput or RAM are going to be worth a lot of man-hours worth of programming and ongoing maintenance. Usually throwing a couple more VMs at the problem is just going to be the path of lesser resistance. This was always the case when comparing lower-level to higher-level, but Go amplifies it because the performance gap isn't as big, but the complexity gap remains very large or maybe gets larger.

Is writing Rust code really that hard? No, not at all, it's more that writing good Go code is so relatively easy, the language is simple and the standard library is loaded with useful tools. Go's CLI flag parser is pretty unimpressive, but so many projects just need a very basic flag parser and it works totally fine for that, you just don't need to overthink it 99.99% of the time. Same for net/http, the built-in image and compression codecs, TLS stack, and more. Meanwhile, designing and maintaining good high-quality Rust crates is just relatively hard. You've got to worry about async vs sync, various cfg options like nostd and noalloc, and dealing with a lot of wrinkles in the language. Want to use a dyn trait? You'll need to make sure the trait doesn't have any functions with generic parameters or async; makes perfect sense, but adds tension; you want to avoid unnecessary runtime costs in the ideal case but still have the flexibility to use a dyn trait in other cases. Not to mention the borrow checker and how it interacts with a lot of these design choices. Go code is much dumber, but often times it's sufficient.

And that's the crux of it. Go is simple in a stupid way, rather than an elegant way, but it really makes the most of it. If you want to supplant Go at what it does best, trying to make a better programming language overall may be a bit misguided. It's probably not that hard to come up with a programming language design that is "better" than Go. What's hard, IMO, is coming up with a programming language where the cognitive load increase relative to Go is met with a pay-off that people using Go would consider genuinely worth it. Something like guaranteed data race safety is definitely good enough if it's something someone needs or at least strongly wants for a given use case. Sum types, on the other hand, are very nice thing to have that make modelling data easier and less error-prone, but not having them isn't exactly the end of the world... In Go, people sometimes emulate them with interfaces and type-switches, and it's not fantastic, but it's usually sufficient.

Ocaml probably could/should be more successful, but I'm not sure it competes with Go, I think they're in entirely different wheelhouses. Ocaml feels like it doesn't want to compete with Go, but then again, I only have relatively surface-level knowledge of Ocaml, so I can't really say for sure.

alain_gilbert•6mo ago

Check out this toy project I made.

It's basically a fork of the Go lexer/parser that adds Result/Option/Tuple/Set... propagation operators (and more)

and it compiles down to Go code.

https://github.com/alaingilbert/agl

noelwelsh•6mo ago

I feel OCaml is as easy to write as Go, though perhaps that is not the case for someone with a typical imperative programmer background. I feel a good part of Go's success is that the core language is very similar to Python, which is obviously very widely known.

Most of the issues you discuss with Rust are not issues in OCaml, as OCaml has GC. The language is simpler, in that programs are concerned with fewer concepts (e.g. no lifetimes), but less expressive in that they cannot formally talk about these concepts (though see the Jane Street work to add in some Rust-like concerns: https://blog.janestreet.com/oxidizing-ocaml-locality/)

delifue•6mo ago

Go with sum type: Borgo

https://github.com/borgo-lang/borgo

noelwelsh•6mo ago

I believe it could have, and OCaml didn't because of some bad predictions by the core team (when asked about better concurrency, the response was along the lines that we'll all have tens of CPUs soon, so better in process support wasn't necessary), and the restrictions of academia (there is little value given to engineering work).

dvdkon•6mo ago

Sadly I don't think so, just because it looks "too functional".

If comments on the web are anything to go by, being familiar to C and JS programmers is one of the main reasons for Go's success. I think it has plenty of its own specifics, it's not like e.g. C# which really did start as a straightforward Java clone, but OCaml has even more differences.

jchw•6mo ago

I think it's worth considering the possibility that imperative style is just more approachable than functional style. On one hand, some tasks that are basically trivial few-line solutions in a good functional programming language balloon to hundreds of lines of arguably more error prone code in especially dumb imperative languages like Go. On the other hand, it's really really easy to understand what the vast majority of Go code is doing by just reading it line-by-line. Don't get me wrong: it's easy to write broken and unreadable code in pretty much any programming language (I can write bad code in imperative or functional style, I'm a multi-paradigm bad programmer) but at the very least for most imperative code you can really just follow everything line-by-line. It's harder to pull apart complex compositions of map/reduce/filter/fold/etc. (especially if you involve other complex topics like async.)

This does seem to be quite a conundrum. I don't think there's any argument regarding the merits of functional programming as a paradigm, but imperative just seems easier to wrap one's head around, and frankly most code is really boring and doesn't need to be able to do the super powerful things that functional programming can do with relative ease.

I feel like I'm playing with fire by saying this here, but I think it's true.

andrewflnr•6mo ago

That's the thing about OCaml, though, it's relatively friendly with imperative programming too. At least, it's not picky about side effects the way Haskell is. It might still be a pain with loops, though. It's been a while since I used it.

suprtx•6mo ago

I think you are correct, perhaps even being too careful in your claims.

I think functional and imperative programming, in practice, tend to have different types of constraints, and there are different things that are easy to reason about in them, and encourage different reasoning methods and approaches, and ways of proving properties about them.

I think that for learning functional programming, being all-in on purely functional programming and never having side-effects or mutation, can be useful. But for practical programming, functional programming is not in itself a virtue just by itself. One should always consider the trade-offs of the specific case one is facing if one is not learning, as such. If a project has decided on a trade-off where purely functional programming is chosen, then maybe continuing with pure FP is fine; but even in such a case, a function that from outside is purely functional, but uses imperative programming internally, for instance for the sake of optimization, can make sense.

There are different advantages and disadvantages to having: pure FP; imperative programming; or different kinds of mixes of FP and imperative programming.

Overall, my approximate opinion is that it does not make sense to be dogmatic about functional programming, except when learning FP.

https://stackoverflow.com/questions/7717691/why-is-the-minim... is arguably a relevant example, though a proper in-place implementation in Haskell might be more fair for comparison. Though I am not certain whether Haskell's lazy evaluation would help or hinder analysis, Ocaml or SML might be better languages for implementing and analysing an in-place implementation variant.

Also, Hacker News is a censorship hole.

suprtx•6mo ago

Sorry, apparently there was a "true" implementation in Haskell in one of the answers, though I am not sure how the lazy evaluation of Haskell is handled.

https://stackoverflow.com/a/7719971

You can see that the Haskell code is more verbose than the C code.

suprtx•6mo ago

Prolog, not being based on imperative programming or FP, but being based on logic programming, is another example in regards to this topic. Prolog can be very concise and clear for a specific set of problems, but reasoning about the time complexity of an algorithm implemented in it can often be difficult at best.

pjmlp•6mo ago

Go only enjoys widespread adoption thanks to Docker and Kubernetes success in the industry, had it not been the case, we would be talking about it just like we talk about Limbo and Oberon-2, its influences.

The kind of folks that dropped Python and Java for Go, would not have picked up OCaml even if it had a better concurrency story 10-15 years ago.

andrewflnr•6mo ago

I guess part of the hypothetical is, "a version of OCaml supported and pushed by Google". But the same idea probably applies, Rob Pike et al (he was involved right?) probably never go for a functional language, even a relatively pragmatic one.

pjmlp•6mo ago

That would be Kotlin, Google's C#, given how history goes.

I has enough ML influence on it, as a simplified Scala, and also can compile to native in various ways.

On the other hand you will notice that the only thing Android team only cared about Go, was the Soong build syste, and parts of the GPU debugger.

eru•6mo ago

> Lack of sum types is probably one of the worst things about working in Go, and I think it is a much bigger problem than the lack of generics ever was.

Maybe, but once you have eg an Option or Either (a.k.a. Result) type, you typically really want to have some functions that work generically on all versions of them. (Though you could probably get away with using Go's version of void*, the empty interface, in a lot of cases?)

jchw•6mo ago

Basically, anything that makes it possible to do an Etiher/Result type blows the whole damn thing up. It calls everything into question, e.g. whether multiple returns really makes sense. It's kind of a shame, because I would really like to be able to model things better (and ideally more efficiently too) when working in Go.

eru•6mo ago

The 'multiple return values' thing is really stupid in any case, even without sum-types: it's like "we have tuples, but only for this one special case."

Just add first-class tuples to the language.

dontlaugh•6mo ago

It’s an actual liability, since you can’t uniformly compose the results of functions with different numbers of return values.

aatd86•6mo ago

I think it could be retroffited but would require a syntax to have a variable marked as being a tuple.

If we look behind the curtains, go functions somehow return tuples that are automatically unrolled into respective variables, sometimes. That's why if you have f(a, b, c){} and g() (b,c){} you can't do f(a, g()) currently.

It does not unroll it automatically for some reason here. The result remains in "tuple" form ?

I think it may leave some space to have tuples if needed.

In any case, I don't see it as technically infeasible.

dontlaugh•6mo ago

Go functions don’t actually semantically return tuples, multiple return is part of the calling convention.

It can be changed more easily than in other languages because everyone statically links, but it’s not trivial.

aatd86•6mo ago

They are not first class in the language but I think I remember the compiler having them as a representation for some language objects.

Maybe https://go.dev/src/go/types/tuple.go

pjmlp•6mo ago

While it is the default, dynamic linking has been supported in Go for quite a while.

eru•6mo ago

> I think it could be retroffited but would require a syntax to have a variable marked as a being tuple.

Like Perl's sigils? Or PHP's distinction between 'normal' variables with a $ and variables of function type which have no marking but are case insensitive?

pjmlp•6mo ago

Personally, by now I would already be happy with Pascal like enumerations, instead of the stupid iota/const hack, yes it is an hack.

    type
        StatusCodes = (Success, Ongoing, Done)

or if you prefer, while bikeshedding what keyword to use,

    type
        StatusCodes enumeration (Success, Ongoing, Done)

Naturally, it is so much better to write

    type StatusCodes int

    const (
     Success StatusCodes = iota
     Ongoing
     Done
    )

jerf•6mo ago

The major problem with "bolting on" sum types to Go is that it is way too redundant with interfaces and the interactions get weird, because this already works:

    type Color interface {
        isColor()
    }

    type Red struct {}
    func (r Red) isColor() {}

    type Green struct {}
    func (g Green) isColor() {}

    type RGB struct {
        Red byte
        Green byte
        Blue byte
    }
    func (r RGB) isColor() {}

    func PrintColor(color Color, text string) {
         switch c := color.(type) {
         case Red:
             // print in red here
         case Green:
             // print in green here
         case RGB:
             // handle the RGB
         }
    }

This doesn't prevent the zero value of a Color being 'nil' but it does make "a value of type Color" effectively a sum type because you can't do anything useful to it without unpacking it in a type-aware way. There is no way in Go to have a Red but act like you have an RGB, values always have their types.

(There is a common misconception among Go programmers that this doesn't seal the type because you can implement a "isColor" method on your own types in other packages, but it won't work. Try it if you want to be sure.)

One could argue they've effectively already been bolted on, all that is missing is some syntax gloss like

    sum Color {
        Red{}
        Green{}
        RGB{
            Red   byte
            Green byte
            Blue  byte
        }
    }

that would simply unsugar to the above.

jchw•6mo ago

Yes, this is all true. However, this approach has more downsides than just nil and cumbersome syntax imo. It takes more memory and pointer-chasing than "proper" sum types would require, and while you can "seal" the type, there's no exhaustiveness checking.

Taniwha•6mo ago

Wirth was on the Algol68 committee - I'm sure he understood how those sorts of unions worked.

He also avoided a lot of the more advanced features of Algol68, he thought it too complex, when he designed Pascal

adrian_b•6mo ago

I doubt that he understood how unions must work, because the "variant record" he has put in Pascal in 1970 was really bad, worse than even the initial proposal of John McCarthy, while the "union" of ALGOL 68 was pretty decent.

Implementing correctly disjoint unions, i.e. allowing them to be used only exactly like an enumeration in the variable tested by a select/case/switch statement, and then only as the correct type in each of the alternatives, introduces a little overhead in a compiler, but it is certainly not a serious source of bloat in comparison with most other things that must be done by a compiler.

If the programming language designer had clearly in mind how disjoint union types must work, they would have been easy to implement even for the minicomputers and microcomputers of the seventies.

pjmlp•6mo ago

Niklaus Wirth eventually started designing for minimalism, while I greatly enjoy some languages designed by him, the ultimate minimalism that kept himself busy in Oberon-07 was clearly not my point of view.

Even Go v1.0 type system is more advanced than Oberon-07 final form.

adrian_b•6mo ago

I agree.

In my opinion, the most important contributions of Niklaus Wirth to programming languages have been in his early work on Euler, PL360 and ALGOL W, which introduced various features that were novel at that time.

Starting with Pascal in 1970, his programming languages remained reasonably good for teaching programming and how to implement a compiler, due to their simplicity, but all of them were seriously behind contemporaneous languages.

While Mesa of Xerox was a nice and innovative language, that cannot be said about Modula, Wirth's attempt to reimplement similar features after his sabbatical at Xerox, which was only mediocre.

On the other hand the early languages of Wirth were very innovative, e.g. Euler was one of the first 2 languages with pointers, the other being CPL. In contrast with CPL which had implicit pointer dereferencing, Euler had explicit address-of and indirection operators, but it got their syntax right, not like in C, where the indirection operator has been mistakenly defined as prefix, instead of postfix.

pjmlp•6mo ago

However I still think the world would have been better with Modula-2 than C, unfortunately marketing was never Niklaus Wirth strong point, and no mainstream OS made it unavoidable.

Zig is basically Modula-2 type system in C clothing, plus comptime, if only people were equally as hyped back in the 1980's.

The usual "Why Pascal..." fails flat in the presence of Modula-2, which was actually designed as systems language, and not as a language to learn about programming.

Oberon (the 1992 original), for its simplicity, introduced a wider audience to the concept that system programming with automatic resource management isn't something out of this world, even though Cedar is more interesting in features.

I was more interested into Component Pascal and Active Oberon, even though there were the work of other researchers at ETHZ.

Nonetheless it was his work, that inspired me to dig into everything Xerox PARC was doing, and discovering there was more happening there than only Smalltalk.

I became amazed at the work done across Interlisp-D, Mesa, Cedar, how advanced their ideas for what an IDE is supposed to look like, that many mainstream languages still can't offer.

So in a sense that was also a contribution from Niklaus Wirth to everyone that got interested into his work, and decided to go down the rabbit hole.

suprtx•6mo ago

Sorry, nit: "fails flat" should probably be "falls flat".

pjmlp•6mo ago

Thanks.

igouy•6mo ago

There was a little commercial development with BlackBox

https://blackboxframework.org/

pjmlp•6mo ago

I know, note Oberon Microsystems is listed here, it was a quite interesting event, where I got to meet one of my heroes.

https://indico.cern.ch/event/421395

eru•6mo ago

Alas, this one gives a 403 Forbidden.

https://archive.is/oTbMW works though.

Buttons840•6mo ago

> But another thing Muratori points out is that is that Dahl and Nygaard copied the feature in safe working form into Simula, and Stroustrup knew about it and intentionally dropped it from C++, thinking it inferior to the encapsulation you get from inheritance. This is funny! Because of course C already had case #3 above -- completely unchecked/unsafe unions, they only showed up in 1976 C, goodness knows why they decided on that -- and the safe(ish) std::variant type has taken forever to regrow in C++.

This seems like a mistake. At the end of the day, a bunch of code and logic has to be written somewhere, and I think it's better done outside the data object, at least some of the time.

Imagine you have the classic Shape class / interface and someone wants to write some code to determine whether a Shape is happy or sad, based on their synesthesia, what are they suppose to do? I guess just add a happy_or_sad() method to the interface? Like, we're just going to pile--err, I mean, "encapsulate"--every possible thing that can be done with the data into the data object?

The OOP way is probably some Shape class hierarchy with a Shape superclass and a bunch of specific Square, Circle, Triangle, subclasses. So I guess you go and modify a dozen subclasses to add your happy_or_sad() method. And you're definitely going to have to fork the code because nobody wants to upstream your personal feelings about which Shapes are happy or sad.

It's better to have a sum type for your Shape and then everyone can put all their code and logic outside of the Shape itself, and the type system will ensure, at compile time, that no Shape variants have been missed, so refactoring is assisted by the type system.

eru•6mo ago

Inheritance can simulate sum-types. You can also simulate sum-types via what OOP people would probably call a visitor pattern: you hand me call-backs for what to do in all the different cases, and some trusted piece of code that you implement once for the type, does the case distinction and calls the right call-back. (You can either hand over the call-backs as functions / function pointers, or you can implement them as member methods pure OOP style.)

They are all equivalent in principle, but some of them are a lot more annoying to work with, especially when you want to do a pattern matching over multiple values at the same time, or match on nested patterns.

Buttons840•6mo ago

If I write my happy_or_sad() callback and pass it to the Shape, it would be nice if I could get some exhaustiveness checking, but like the blog says, C++ and other popular languages intentionally left it out.

eru•6mo ago

In what I described, you would need to hand callbacks for all possibilities, because no parameters are optional.

layer8•6mo ago

With the visitor pattern, you have to implement an interface with one method for each choice, so (in statically typed languages) that implicitly does an exhaustiveness check, because if you forget to implement one of the interface methods, instantiating your implementing class won’t compile.

One nice thing about the visitor pattern is that it doesn’t have to match the type hierarchy. For example, you could have a visitor interface method that is invoked for blue shapes, even if there is no BlueShape type. Similarly, the same type hierarchy can support multiple visitor interfaces, so that you can perform different case distinctions on the same value. This is something that sum types can’t do.

eru•6mo ago

> Similarly, the same type hierarchy can support multiple visitor interfaces, so that you can perform different case distinctions on the same value. This is something that sum types can’t do.

Haskell's pattern synonyms should be able to handle this?

The problem with the visitor pattern is that it doesn't compose well. So if you want to match on two values at once, or match deeper into a value, that works well for most pattern matching, but is annoying to piece together with the visitor pattern.

pyrale•6mo ago

A gard on your pattern match would be enough I guess.

eru•6mo ago

It depends on what you are trying to do. Eg guards can really replace or-patterns.

layer8•6mo ago

I’m not really familiar with Haskell’s pattern synonyms, but maybe yes. What the visitor pattern gives you is that in principle the implementation structure can be orthogonal to the visitor case distinction. You can have objects encapsulate (and possibly hide) one structure while exposing different structural views via visitor. This also allows to re-arrange the implementation of the alternatives without breaking client code. In functional languages, you’d probably rather have functions converting the source value to a different (sum) type to perform the case distinction on.

I agree that syntactic sugar for visitor ”matching” would be nice, and it’s something a language could add. The visitor pattern itself doesn’t prevent adding such syntax.

greener_grass•6mo ago

The designer of the type must manually ensure exhaustive checking but then consumers will be forced to do exhaustive matching.

It's not that bad in practice.

pyrale•6mo ago

> You can also simulate sum-types via what OOP people would probably call a visitor pattern

You can also simulate it by using if/else everywhere.

At this point, that means you have zero language feature supporting the use case, and type safety is up to the developers implementing patterns correctly everywhere.

pjmlp•6mo ago

The way it goes is that sum types are closed for extension, while enjoying the confort of pattern matching, whereas OOP approach is open for extension, with the associated boilerplate in the visitor logic.

In languages of ML linage, we can combine both approaches with a mix of sum types and functors/type classes.

eru•6mo ago

> [...] whereas OOP approach is open for extension, with the associated boilerplate in the visitor logic.

Yes, by default. But doesn't eg Java let you mark classes as final or something like that?

> In languages of ML linage, we can combine both approaches with a mix of sum types and functors/type classes.

I think Rust allows something similar, but I'm not sure whether you'd call it 'ML linage'?

pjmlp•6mo ago

Yes, you can mark classes final as closed for extension, however Java follows the wisdom at the time that by default classes are open for extension.

Additionally, modern Java has sum types modeled on the same way as Scala.

Rust is to some extent inspired by ML languages, hence why several OCaml and Haskell refugees also hang around Rust.

taeric•6mo ago

Amusingly (to me, at least), I took a stab a long time ago to show how a visitor could get you most of this. As I state in the post, I make no claim that this is as convenient or powerful. It does do what you basically describe, though. https://taeric.github.io/sum-types.html

Note instead of my worse Optional class, what you are describing is a Shape class where there is a different function for each supported shape. If you add a new shape, everywhere that made a shape visitor would have to be updated do deal with the new shape. (A sibling post described this.)

Amusingly, I recall coworkers that did not want to use the "acceptVisitor" method and would force cast to call over to the methods directly. Caught me incredibly off guard.

ahaferburg•6mo ago

C++ is a gun with five triggers, one for each finger that holds the grip. That's what makes it so versatile and powerful.

dontlaugh•6mo ago

And most of the triggers blow up the gun, unless you’re pulling multiple in a specific combination. Most combinations also blow up the gun.

voidUpdate•6mo ago

The gun is also permanently attached to your leg, pointing downwards

suprtx•6mo ago

While I only watched 25%-50% of the linked talk by Casey Muratori, spread out here and there, and fast-forwarded through the rest, I did not like his talk. And it reflects on this blog post as well. Casey Muratori obviously spent a lot of time on it, but programming and computer science is a huge field, and it is possible to spend a lifetime on even a part of one aspect of it.

In that talk, Casey Muratori refers to Simula, a PDF of it can be found at https://www.mn.uio.no/tjenester/it/hjelp/programvare/simula/... . You may want to use an OCR tool on that PDF, for instance ocrmypdf is available for a number of Linux distributions. I am not sure if it is the same version of Simula as what is being discussed, but it does have the "connection" statement, at PDF-page 56, which has "inspect" as the first syntactical element. That does look vaguely similar to the pattern matching of ML, but it does not AFAICT support a number of significant features that many love about modern pattern matching, such as nested patterns. Does it have field bindings as part of a pattern, and matching against specific constant values, or only matching the class type? I am not sure if it supports exhaustiveness checking. Does it mandate a finite number of possibilities, to help exhaustiveness checking? And the "connection" statement has two variants. AFAICT, it is the kind of abstraction that is primitive enough that one can get close to its functionality with "switch" in C++ together with a type-cast, and a far cry from what Standard ML (later?) supported. In that light, it might not be surprising that it was not included in C++.

When was pattern matching as we know it in modern times invented, or was it a gradual evolution? https://en.wikipedia.org/wiki/Hope_(programming_language) is cited as introducing https://en.wikipedia.org/wiki/Algebraic_data_type in the 1970s. And Hope had for instance this spin on just one aspect of pattern matching:

> Changing the order of clauses does not change the meaning of the program, because Hope's pattern matching always favors more specific patterns over less specific ones.

This is different from modern pattern matching, where the order (AFAIK generally across modern languages) does matter.

I am not sure that Casey Muratori did a good job of researching this topic, but I am not sure if and how much I can fault him, since the topic is complex and huge and may require a lot of research. Researching the history of programming languages may be difficult, since it would both require a high technical level and also have to be focused on history. One could probably have several full-time university positions just spending their time researching, documenting and describing the history of programming languages. And the topic is a moving target, with the professionals having to have a good understand of multiple languages and of programming language theory in general, and preferably also some general professional software development experience.

All in all, the data types and pattern matching of the 1970s might be extremely different from the discriminated unions and pattern matching of the 1990s. C++ also does not have garbage collection, which complicates the issue. Rust, for instance, that also does not have garbage collection, has different binding modes for the bindings in pattern matches.

It is important to note that subtyping and inheritance are different. And even FP languages can use subtyping.

I think both Casey Muratori (and Graydon Hoare, if he has not already read it) could be interested in reading the book Types and Programming Languages, even though that book is old by now and may not contain a lot of newer advancements and theory. I also think that Casey Muratori could have benefited (in regards to this talk, at least) from learning and using Scala and its sealed traits in regards to pattern matching, if I recall correctly, Scala had as one of its objectives to attempt unify OOP and FP. I do agree that OOP can be abused, and personally I am lukewarm on inheritance, especially as direct modelling of a domain as discussed in the talk without deeper thought whether such an approach is good relative to other options and trade-offs. But subtyping, as well as objects that can be used as a kind of "mini-module", is typically more appealing than inheritance IMO. "Namespacing" of objects is also popular.

Some theory and terminology also discuss "open" and "closed" types.

And, after all, Haskell has type classes, which is not OOP, but is relevant for ad-hoc polymorphism (is Casey Muratori familiar with type classes or ad-hoc polymorphism?), Rust has traits, not quite the same as type classes but related. Scala has various kinds of implicits in regards to that. And Rust also has "dyn" traits, not so commonly used, but are available.

suprtx•6mo ago

I forgot to mention that even Haskell has had some strangeness in its pattern matching once, that it banned later, as recently as 2010.

https://stackoverflow.com/questions/3748592/what-are-nk-patt...

> What do they mean by "n+k patterns"? I guess it's the second line, but I don't get what might be wrong with it. Could anyone explain what is the issue there? Why aren't these n + k patterns allowed any more in Haskell 2010?

suprtx•6mo ago

> understand

Should be "understanding".

X (Twitter) is back with a new X API Pay-Per-Use model

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

When Michelangelo Met Titian

Solving NYT Pips with DLX

Baldur's Gate to be turned into TV series – without the game's developers

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

Disablling Go Telemetry

Effective Nihilism

The UK government didn't want you to see this report on ecosystem collapse

No 10 blocks report on impact of rainforest collapse on food prices

Seedance 2.0 Is Coming

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Dexterous robotic hands: 2009 – 2014 – 2025

Interop 2025: A Year of Convergence

JobArena – Human Intuition vs. Artificial Intelligence

Concept Artists Say Generative AI References Only Make Their Jobs Harder

Show HN: PaySentry – Open-source control plane for AI agent payments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

The Crumbling Workflow Moat: Aggregation Theory's Final Chapter

Pax Historia – User and AI powered gaming platform

Show HN: I built a RAG engine to search Singaporean laws

Scams, Fraud, and Fake Apps: How to Protect Your Money in a Mobile-First Economy

Porting Doom to My WebAssembly VM

Cognitive Style and Visual Attention in Multimodal Museum Exhibitions

Full-Blown Cross-Assembler in a Bash Script

Logic Puzzles: Why the Liar Is the Helpful One

Optical Combs Help Radio Telescopes Work Together

X (Twitter) is back with a new X API Pay-Per-Use model

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

When Michelangelo Met Titian

Solving NYT Pips with DLX

Baldur's Gate to be turned into TV series – without the game's developers

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

Disablling Go Telemetry

Effective Nihilism

The UK government didn't want you to see this report on ecosystem collapse

No 10 blocks report on impact of rainforest collapse on food prices

Seedance 2.0 Is Coming

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Dexterous robotic hands: 2009 – 2014 – 2025

Interop 2025: A Year of Convergence

JobArena – Human Intuition vs. Artificial Intelligence

Concept Artists Say Generative AI References Only Make Their Jobs Harder

Show HN: PaySentry – Open-source control plane for AI agent payments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

The Crumbling Workflow Moat: Aggregation Theory's Final Chapter

Pax Historia – User and AI powered gaming platform

Show HN: I built a RAG engine to search Singaporean laws

Scams, Fraud, and Fake Apps: How to Protect Your Money in a Mobile-First Economy

Porting Doom to My WebAssembly VM

Cognitive Style and Visual Attention in Multimodal Museum Exhibitions

Full-Blown Cross-Assembler in a Bash Script

Logic Puzzles: Why the Liar Is the Helpful One

Optical Combs Help Radio Telescopes Work Together

Losing language features: some stories about disjoint unions

Comments