Looks like it refers to this:
(wish it's a link in the article)
> I used to believe the the tailcalling interpreters get their speedup from better register use. While I still believe that now, I suspect that is not the main reason for speedups in CPython.
> My main guess now is that tail calling resets compiler heuristics to sane levels, so that compilers can do their jobs.
> Let me show an example, at the time of writing, CPython 3.15’s interpreter loop is around 12k lines of C code. That’s 12k lines in a single function for the switch-case and computed goto interpreter.
> […] In short, this overly large function breaks a lot of compiler heuristics.
> One of the most beneficial optimisations is inlining. In the past, we’ve found that compilers sometimes straight up refuse to inline even the simplest of functions in that 12k loc eval loop.
I'd have expected it to be hand rolled assembly for the major ISAs, with a C backup for less common ones.
How much energy has been wasted worldwide because of a relatively unoptimized interpreter?
# if defined(_MSC_VER) && !defined(__clang__)
# define Py_MUSTTAIL [[msvc::musttail]]
# define Py_PRESERVE_NONE_CC __preserve_none
# else
# define Py_MUSTTAIL __attribute__((musttail))
# define Py_PRESERVE_NONE_CC __attribute__((preserve_none))
# endif
https://github.com/python/cpython/pull/143068/files#diff-45b...Apparently(?) this also needs to be attached to the function declarator and does not work as a function specifier: `static void *__preserve_none slowpath();` and not `__preserve_none static void *slowpath();` (unlike GCC attribute syntax, which tends to be fairly gung-ho about this sort of thing, sometimes with confusing results).
Yay to getting undocumented MSVC features disclosed if Microsoft thinks you’re important enough :/
> By 1977[2][3] the phrase had entered American usage as slang for the cum shot in a pornographic film
Edit: Read through it and have come to the conclusion that the post is 100% OK and properly framed: He explicitly says his approach is to "sharing early and making a fool of myself," prioritizing transparency and rapid iteration over ironclad verification upfront.
One could make an argument that he should have cross-compiler checks, independent audits, or delayed announcements until results are bulletproof across all platforms. But given that he is 100% transparent with his thinking and how he works, it's all good in the hood.
machinationu•2h ago