Also, if you care about function call performance, I guess you'd use PyPy. Have you tried to run the benchmarks (with appropriate warmup) on PyPy to see if the results carry over?
Regarding PyPy, I did run some benchmarks recently (I use pytest-benchmark). The advantage over other libraries remains and the magnitude is similar, but when I compare with custom if/isinstance code, that code is optimized a lot more aggressively and it gains a significant edge (let's say ~5x). Now, since I'm into codegen, part of me feels like meeting the challenge and figuring out a way to generate optimal code from a set of signatures, but I think I've spent enough time as it is haha.
You called that a joke, but I think it's just the truth.
See for example https://docs.raku.org/language/functions#Multi-dispatch
In particular, type declarations, as they were first introduced in python, did not have a run-time effect, and they very much have a run-time effect with multi methods.
(Dataclasses are kind of an exception to this rule, but one that feels far less fundamental than multi dispatch would).
Can people list some times when they actually used multimethods to solve a real problem? How did it work out? Would you do it again?
One case where I actually rely on multiple dispatch is conversion to more or less structured data. Inspired by Clojure, I have a function `conform(OutputDataType, input_data::SomeType)` and specific implementations depend on both types involved.
Multiple dispatch is also really helpful when two libraries don't quite do the same thing. In python pandas, numpy, and pytorch all have slightly different implementations of x.std() (standard deviation) with slightly different APIs. This means you can't write code that's generic across a numpy array or a pytorch tensor. In Python this could only easily be fixed if library authors coordinate, but with multiple dispatch you can just fix this in your own code.
In particular, the fact that these types would silently give you different answers if you called `.std()` was a big headache
It was very common for us to want to be generic over pandas series and numpy arrays. A bit less so with pytorch tensors, but that was because we just aggressively converted them to numpy arrays. Fundamentally these three are all very similar types so it's frustrating to have to treat them differently.
I was writing unit tests once against histograms. That code is super finnicky, and I couldn't get pandas and polars numbers to tie out. I wasn't super concerned with the exact output for my application, just that number of buckets was the same and they were roughly the same size. Just bumping to a new version of numpy would result in test breakages because of floating point rounding errors. I'm sure there are much more robust numerical testing things I could have done
Before switching to Julia we eventually standardized on numpy.std with ddof=1, and with some error tolerances in our tests.
I know that dependent type is the term of art, and you should probably keep it. You could say something along the lines of
"ovld supports dependent types (an additionally specific name for a type that is based on its value, ie > 0)" the first time you use the term dependent types."
Example: turning a data structure into a JSON string, solved with multi dispatch: https://github.com/moritz/json/blob/master/lib/JSON/Tiny.pm#...
Another common useful example is constructors. A Date class might have constructors that accept
Date(Int, Int where 1..12, Int where 1..31) # Y, M, D
Date(Str where /^\d4-\d2-\d2$/) # YYYY-MM-DD
Date(DateTime) # extract the date of a DateTime object
etc. to make it very intuitive to use.One case where I'm finding it extremely useful is that I'm currently using ovld to implement a serialization library (https://github.com/breuleux/serieux, but I haven't documented anything yet). The arguments for deserialization are (declared_type, value, context). With multimethods, I can define
* deserialize(type[int], str, Any) to deserialize a string into an int
* deserialize(type[str], Regexp[r"^\$"], Environment) to deserialize $XYZ as an environment variable lookup but only if the context has the Environment type
* deserialize(type[Any], Path, WorkingDirectory) to deserialize from a file
That's one situation where ovld's performance and codegen capabilities come in handy: the overhead is low enough to remain in the same performance ballpark as Pydantic v2.
If the language is dynamically typed, you can use it for polymorphism.
For those who are wondering: they alias @typing.overload as @ovld only for type checkers.
Normally, @typing.overload is a way to define multiple function signatures with empty bodies so your type checker can work nicely with functions that for example only return a certain type when called with a certain signature, but the function implementation is left to you and usually involves a bunch of runtime type checks to branch your logic the right way
I think it would make sense to have each function return either the exact same type, or a generic.
@ovld(int) #all of these overloads should return an int
len(a:list) -> int
len(a:str) -> int
#the effective type of len is
len(a:Union[int, str]) -> int
and then a generic example T = TypeVar('T')
@ovld(T)
add(a:np.int8, b:np.int8) -> np.int8
@ovld(T)
add(a:np.int16, b:np.int16) -> np.int16
In the second example, I don't know how you would connect the input params to the generic T. If you put concrete types in, I see how ovld works.Am I missing something?
I did this: https://news.ycombinator.com/item?id=43982570
linschn•1d ago
On the other hand, I can't stop myself from thinking about "Greenspun's tenth rule":
> Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.
This doesn't apply directly here, as the features are intentional and it seems they are not bug ridden at all. But I get a nagging feeling of wanting to shout 'just use lisp!' when reading this.
https://wiki.c2.com/?MultiMethods
hasley•1d ago
pansa2•1d ago
Julia definitely made the right choice to implement operators in terms of double-dispatch - it’s straightforward to know what happens when you write `a + b`. Whereas in Python, the addition is turned into a complex set of rules to determine whether to call `a.__add__(b)` or `b.__radd__(a)` - and it can still get it wrong in some fairly simple cases, e.g. when `type(a)` and `type(b)` are sibling classes.
I wonder whether Python would have been better off implementing double-dispatch natively (especially for operators) - could it get most of the elegance of Julia without the complexity of full multiple-dispatch?
ChrisRackauckas•1d ago
StefanKarpinski•13h ago
Double dispatch feels like kind of a hack, tbh, but it is easier to implement and would certainly be an improvement over Python's awkward `__add__` and `__radd__` methods.
[1] https://janvitek.org/pubs/oopsla18b.pdf
ethagnawl•1d ago
cdrini•1d ago
ethagnawl•1d ago
upghost•1d ago
Having written one of these[1] a decade ago and inflicting it (with the best of intentions) upon production code in anger, I can tell you this often leads to completely unmaintainable code. It is impossible to predict the effect of changing a method, tracing a method, debugging a method (where do I put the breakpoint??).
The code reads beautifully though. Pray you never have to change it.
The reason I say "just use haskell" instead of lisp is bc lisp generics suffer from this same problem.
Btw if anyone has a solution to this "generic/multidispatch maintainability in a dynamically typed language" problem I would love to hear it.
[1]: https://github.com/jjtolton/naga/blob/master/naga/tools.py
Fr0styMatt88•1d ago
igouy•1d ago
Which kind-of question? "where do I put the breakpoint??"
Fr0styMatt88•1d ago
I like Python, but I like static typing too because there’s just less to think about and when I have to learn a new codebase there’s a lot of assumptions I can lean on about how things work; this saves time.
I like the idea of Smalltalk and when you watch Alan Kay or Dan Ingalls talk about it, they make total sense and you have Pharo and Squeak to back it up as in “yes, you can build large systems with this idea”.
But I don’t think you could program Smalltalk and have it be maintainable without everything else the environment brings. Being inside the environment with your objects. The total different approach of sending a message an object doesn’t understand, then having the debugger pop up and then you just implementing that message right there. That’s just an utterly different workflow.
I like the idea in ‘late binding of all things’ and I think the approach of writing a DSL for your problem and then having to write far less code to solve your problem is great. But the objection is then always “okay but what about when someone else has to work with that code”.
I guess what I’m trying to say is, the more dynamic your language is, the more support you need from your tooling to ease the cognitive load while you program, simply because the state-space of things you can do is bigger and not being restricted by types, etc.
igouy•9h ago
https://gallium.inria.fr/~remy/poly/mot/10/nice/web/language...
> … but what about when someone else has to work with that code.Someone else has had to work with that code since before Smalltalk escaped Xerox PARC.
1984 "Smalltalk-80 The Interactive Programming Environment" page 500
"At the outset of a project involving two or more programmers: Do assign a member of the team to be the version manager. … The responsibilities of the version manager consist of collecting and cataloging code files submitted by all members of the team, periodically building a new system image incorporating all submitted code files, and releasing the image for use by the team. The version manager stores the current release and all code files for that release in a central place, allowing team members read access, and disallowing write access for anyone except the version manager."
https://rmod-files.lille.inria.fr/FreeBooks/TheInteractivePr...
Later "ENVY/Developer":
https://archive.esug.org/HistoricalDocuments/TheSmalltalkRep...