frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
190•theblazehen•2d ago•54 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
678•klaussilveira•14h ago•202 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
953•xnx•20h ago•552 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
125•matheusalmeida•2d ago•33 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
25•kaonwarb•3d ago•20 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
61•videotopia•4d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
233•isitcontent•15h ago•25 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
226•dmpetrov•15h ago•121 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
38•jesperordrup•5h ago•17 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
332•vecti•17h ago•145 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
498•todsacerdoti•22h ago•243 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
384•ostacke•20h ago•96 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
360•aktau•21h ago•183 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
20•speckx•3d ago•10 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
291•eljojo•17h ago•181 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
413•lstoll•21h ago•279 comments

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
6•matt_d•3d ago•1 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
20•bikenaga•3d ago•10 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
66•kmm•5d ago•9 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
93•quibono•4d ago•22 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
259•i5heu•17h ago•200 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
33•romes•4d ago•3 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
38•gmays•10h ago•12 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1073•cdrnsf•1d ago•457 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
60•gfortaine•12h ago•26 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
291•surprisetalk•3d ago•43 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
150•vmatsiiako•19h ago•71 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
7•1vuio0pswjnm7•1h ago•0 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
154•SerCe•10h ago•144 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
73•phreda4•14h ago•14 comments
Open in hackernews

Python 3.15’s interpreter for Windows x86-64 should hopefully be 15% faster

https://fidget-spinner.github.io/posts/no-longer-sorry.html
400•lumpa•1mo ago

Comments

machinationu•1mo ago
The Python interpreter core loop sounds like the perfect problem for AlphaEvolve. Or it's open source equivalent OpenEvolve if DeepMind doesn't want to speed up Python for the competition.
g947o•1mo ago
> This has caused many issues for compilers in the past, too many to list in fact. I have a EuroPython 2025 talk about this.

Looks like it refers to this:

https://youtu.be/pUj32SF94Zw

(wish it's a link in the article)

eru•1mo ago
> (wish it's a link in the article)

I've asked Ken. He said he'll update the article.

Hendrikto•1mo ago
TLDR: The tail-calling interpreter is slightly faster than computed goto.

> I used to believe the the tailcalling interpreters get their speedup from better register use. While I still believe that now, I suspect that is not the main reason for speedups in CPython.

> My main guess now is that tail calling resets compiler heuristics to sane levels, so that compilers can do their jobs.

> Let me show an example, at the time of writing, CPython 3.15’s interpreter loop is around 12k lines of C code. That’s 12k lines in a single function for the switch-case and computed goto interpreter.

> […] In short, this overly large function breaks a lot of compiler heuristics.

> One of the most beneficial optimisations is inlining. In the past, we’ve found that compilers sometimes straight up refuse to inline even the simplest of functions in that 12k loc eval loop.

kccqzy•1mo ago
I think in the protobuf example the musttail did in fact benefit from better register use. All the functions are called with the same arguments, so there is no need to shuffle the registers. The same six register-passed arguments are reused from one function to the next.
cma•1mo ago
Does MSVC support computed goto?
mishrapravin441•1mo ago
Really nice results on MSVC. The idea that tail calls effectively reset compiler heuristics and unblock inlining is pretty convincing. One thing that worries me though is the reliance on undocumented MSVC behavior — if this becomes widely shipped, CPython could end up depending on optimizer guarantees that aren’t actually stable. Curious how you’re thinking about long-term maintainability and the impact on debugging/profiling.
kenjin4096•1mo ago
Thanks for reading! For now, we maintain all 3 of the interpreters in CPython. We don't plan to remove the other interpreters anytime soon, probably never. If MSVC breaks the tail calling interpreter, we'll just go back to building and distributing the switch-case interpreter. Windows binaries will be slower again, but such is life :(.

Also the interpreter loop's dispatch is autogenerated and can be selected via configure flags. So there's almost no additional maintenance overhead. The main burden is the MSVC-specific changes we needed to get this working (amounting to a few hundred lines of code).

> Impact on debugging/profiling

I don't think there should be any, at least for Windows. Though I can't say for certain.

mishrapravin441•1mo ago
That makes sense, thanks for the detailed clarification. Having the switch-case interpreter as a fallback and keeping the dispatch autogenerated definitely reduces the long-term risk.
pxeger1•1mo ago
Profile of llm generated comments
mishrapravin441•1mo ago
ust to clarify, I’m writing these comments myself. I use grammar llm plugin though to clean up phrasing, but the substance is mine.
develatio•1mo ago
if the author of this blog reads this: can we can an RSS, please?
kenjin4096•1mo ago
Got it. I'll try to set one up this weekend.
develatio•1mo ago
Thank you so much!!
redox99•1mo ago
This seems like very low hanging fruit. How is the core loop not already hyper optimized?

I'd have expected it to be hand rolled assembly for the major ISAs, with a C backup for less common ones.

How much energy has been wasted worldwide because of a relatively unoptimized interpreter?

kccqzy•1mo ago
Python’s goal is never really to be fast. If that were its goal, it would’ve had a JIT long ago instead of toying with optimizing the interpreter. Guido prioritized code simplicity over speed. A lot of speed improvements including the JIT (PEP 744 – JIT Compilation) came about after he stepped down.
davidkhess•1mo ago
Should probably mention that Guido ended up on the team working on a pretty credible JIT effort. Though Microsoft subsequently threw a wrench in it with layoffs. Not sure the status now.
IshKebab•1mo ago
If performance was a goal... hell if it was even a consideration then the language would be very different.
eru•1mo ago
Your are mixing up eras.

For comparison: when Javascript was first designed, performance wasn't a goal. Later on, people who had performance as a goal worked on Javascript implementations. Thanks to heroic efforts, nowadays Javascript is one of the language with decently fast implementation around. The base design of the language hasn't changed much (though how people use it might have changed a bit).

Python could do something similar.

mhh__•1mo ago
Python is full of decisions like this / or rather full of "if you just did some more work it'd be 10x better"
int_19h•1mo ago
I doubt it would have a JIT a long time ago. Thing is, people have been making JIT compilers for Python for a long time now, but the semantics of the language itself is such that it's often hard to benefit from it because most of the time isn't in the bytecode interpreter itself, it's dispatching things. People like comparing Python to JavaScript, but Python is much more flexible - all "primitive" types are objects can be subclassed for example, and even basic machinery like attribute lookups have a bunch of customization hooks.

So the problem is basically that a simple JIT is not beneficial for Python. So you have to invest a lot of time and effort to get a few percent faster on a typical workload. Or you have to tighten up the language and/or break the C ABI, but then you break many existing popular libraries.

pjmlp•1mo ago
Those people usually overlook the history of Smalltalk, Self and Common Lisp, which are just as dynamic if not more, due to image use, debugging and compilation on the fly where anything can be changed at any time.

For all its dynamism, Python doesn't have anything closer to becomes:.

I would say that by now what is holding Python back is the C ABI and the culture that considers C code as Python.

eru•1mo ago
> People like comparing Python to JavaScript, but Python is much more flexible - all "primitive" types are objects can be subclassed for example, and even basic machinery like attribute lookups have a bunch of customization hooks.

Most of the time, people don't use any of these customisations, don't they?

So you'd need machinery that makes the common path go fast, but can fall back onto the customised path, if necessary?

int_19h•1mo ago
Descriptors underpin some common language features like method calls (that's how `self` gets bound), properties etc. You can still do it by special casing all those, and making sure that the way you implement all those primitives works exactly as if it used descriptors, sure. But at this point it's not exactly a simple JIT anymore.
pjmlp•1mo ago
He was part of the driving effort after joining Microsoft though.
LtWorf•1mo ago
If you want fast just use pypy and forget about cpython.
mkoubaa•1mo ago
Software has gotten so slow we've forgotten how fast computers are
pjc50•1mo ago
This is (a) wildly over expectations for open source and (b) a massive pain to maintain, and (c) not even the biggest timewaster of python, which is the packaging "system".
loeg•1mo ago
> not even the biggest timewaster of python, which is the packaging "system".

For frequent, short-running scripts: start-up time! Every import has to scan a billion different directories for where the module might live, even for standard modules included with the interpreter.

tweakimp•1mo ago
In the near future we will use lazy imports :) https://peps.python.org/pep-0810/
theLiminator•1mo ago
This can't come soon enough. Python is great for CLIs until you build something complex and a simple --help takes seconds. It's not something easily worked around without making your code very ugly.
peterfirefly•1mo ago
It's not that hard to handle --help and --version separately before importing anything.
eru•1mo ago
You could, but it doesn't really seem all that useful? I mean, when are you ever going to run this in a hot loop?
eru•1mo ago
> [...] not even the biggest timewaster of python, which is the packaging "system".

The new `uv` is making good progress there.

WD-42•1mo ago
Probably because anyone concerned with performance wasn’t running workloads on Windows to begin with.
loeg•1mo ago
They weren't using Python, anyway.
pjmlp•1mo ago
Games and Proton.

Apparently people that care about performance do run Windows.

nilamo•1mo ago
Games are made for windows because that's where the device drivers have historically been. Any other viewpoint is ignoring reality.
pjmlp•1mo ago
Sure, keep believing that while loading Proton.
throw-12-16•1mo ago
Gladly.
eru•1mo ago
> Any other viewpoint is ignoring reality.

Eh, what about users? Games are made for windows, because that's where users (= players) are?

That's even more true for mobile and console games.

WD-42•1mo ago
None of those games, or a very small amount of them, are written in python. None of the ones that need to be performant for sure.
pjmlp•1mo ago
Indeed, but the question was about performance in general.
int_19h•1mo ago
Games aren't written in Python as a whole, but Python is used as a scripting language. It's definitely less popular now than it used to be, mostly thanks to Lua, but it still happens.
mikkupikku•1mo ago
How many games use python for scripting and stay up to date with the version of python they're embedding? My guess is zero.
eru•1mo ago
Doesn't seem all that relevant? New games will benefit from faster Python.
whatevaa•1mo ago
But not python.
pjmlp•1mo ago
Sure, but that wasn't the question.
NetMageSCW•1mo ago
Plenty of DAWs, image editing and video editing being done on Windows.
Calavar•1mo ago
Quite to the contrary, I'd say this update is evidence of the inner loop being hyperoptimized!

MSVC's support for musttail is hot off the press:

> The [[msvc::musttail]] attribute, introduced in MSVC Build Tools version 14.50, is an experimental x64-only Microsoft-specific attribute that enforces tail-call optimization. [1]

MSVC Build Tools version 14.50 was released last month, and it only took a few weeks for the CPython crew to turn that around into a performance improvement.

[1] https://learn.microsoft.com/en-us/cpp/cpp/attributes?view=ms...

pwarner•1mo ago
I remember a former colleague, (may he RIP) ported a similar optimization to our fork of Python 2.5, circa 2007. We were running Linux on PPC and it gave us that similar 10-15% boost at the time.
mananaysiempre•1mo ago
The money shot (wish this were included in the blog post):

  #   if defined(_MSC_VER) && !defined(__clang__)
  #      define Py_MUSTTAIL [[msvc::musttail]]
  #      define Py_PRESERVE_NONE_CC __preserve_none
  #   else
  #       define Py_MUSTTAIL __attribute__((musttail))
  #       define Py_PRESERVE_NONE_CC __attribute__((preserve_none))
  #   endif
https://github.com/python/cpython/pull/143068/files#diff-45b...

Apparently(?) this also needs to be attached to the function declarator and does not work as a function specifier: `static void *__preserve_none slowpath();` and not `__preserve_none static void *slowpath();` (unlike GCC attribute syntax, which tends to be fairly gung-ho about this sort of thing, sometimes with confusing results).

Yay to getting undocumented MSVC features disclosed if Microsoft thinks you’re important enough :/

publicdebates•1mo ago
Important enough, or benefits them directly? I have no good guesses how improving Python's performance would benefit them, but I would guess that's the real reason.
HPsquared•1mo ago
I wonder if this is related to Python in Excel. You'll have lots of people running numerical stuff written in Python, running on Microsoft servers.
mkoubaa•1mo ago
A lot of commercial engineering and scientific software runs on windows.
andix•1mo ago
I guess there are some Python workloads on Azure, Microsoft provides a lot of data analysis and LLM tools as a service (not paid by CPU minutes). Saving CPU cycles there directly translates to financial savings.
acdha•1mo ago
Think about how much effort they have put into things like Pylance and general python support in VAC. Clearly they think they have enough users that this matters to that a first class experience is worth having.
pjmlp•1mo ago
Microsoft was the one hiring Guido out of retirement, and alongside Facebook finally kicking off the CPython JIT efforts.

Python is one of the Microsoft blessed languages on their devblogs.

throw1ahs•1mo ago
The project was first suggested by Mark Shannon. Van Rossum inserted himself into the project. Faster CPython people have been fired by Microsoft last year.

Generally not that much has happened in 5 years, sometimes 10-15% improvements are posted that are later offset by bloat.

I think the project started in 3.10, so 3.9 is the last version to compare to. The improvements aren't that great, I don't think any other language would get so much positive feedback for so little.

pjmlp•1mo ago
I know what happened last year, my point was the prior history that lead to that effort.

https://thenewstack.io/guido-van-rossums-ambitious-plans-for...

Agree with the sentiment, Python is the only dynamic language where it seems a graveyard from efforts.

And nope it isn't the dynamism per se, Smalltalk, Self, Common Lisp are just as dynamic, with lots of possibilities to reboot the world and mess up JIT efforts, as any change impacts the whole image.

Naturally those don't have internals exposed to C where anything goes, and the culture C libraries are seen as the language libraries.

boulos•1mo ago
Ehh, PHP fits that bill and is clearly optimizable. All sorts of things worked well for PHP, including the original HipHop, HHVM, my own work, and the mainline PHP runtime.

Python has some semantics and behaviors that are particularly hostile to optimization, but as the Faster Python and related efforts have suggested, the main challenge is full compatibility including extensions plus the historical desire for a simple implementation within CPython.

There are limits to retrofitting truly high performance to any of these languages. You want enough static, optional, or gradual typing to make it fast enough in the common case. That's why you also saw the V8 folks give up and make Dart, the Facebook ones made Hack, etc. It's telling that none of those gained truly broad adoption though. Performance isn't all that matters, especially once you have an established codebase and ecosystem.

gsnedders•1mo ago
> Performance isn't all that matters, especially once you have an established codebase and ecosystem.

And this is no small part of why Java and JS have frequently been pushing VM performance forward — there’s enough code people very much care about continuing to work on performance. (Though the two care about different things mostly: Java cares much more about long-term performance, and JS cares much more about short-term performance.)

It doesn’t hurt they’re both languages which are relatively static compared with e.g. Python, either.

titzer•1mo ago
> you also saw the V8 folks give up and make Dart

V8 still got substantially faster after the first team left to do Dart. A lot of runtime optimizations (think object model optimizations), several new compilers, and a lot of GC work.

It's a huge investment to make a dynamic language go as fast as JS these days.

eru•1mo ago
> It's a huge investment to make a dynamic language go as fast as JS these days.

Yes, and on the other hand, other language implementations like CPython can learn from everything people figured out for JS.

kenjin4096•1mo ago
> Generally not that much has happened in 5 years, sometimes 10-15% improvements are posted that are later offset by bloat.

Sorry but unless your workload is some C API numpy number cruncher that just does matmuls on the CPU, that's probably false.

In 3.11 alone, CPython sped up by around 25% over 3.10 on pyperformance for x86-64 Ubuntu. https://docs.python.org/3/whatsnew/3.11.html#whatsnew311-fas...

3.14 is 35-45% faster than CPython 3.10 for pyperformance x86-64 Ubuntu https://github.com/faster-cpython/benchmarking-public

These speedups have been verified by external projects. For example, a Python MLIR compiler that I follow has found a geometric mean 36% speedup moving from CPython 3.10 to 3.11 (page 49 of https://github.com/EdmundGoodman/masters-project-report)

Another academic benchmark here observed an around 1.8x speedup on their benchmark suite for 3.13 vs 3.10 https://youtu.be/03DswsNUBdQ?t=145

CPython 3.11 sped up enough that PyPy in comparison looks slightly slower. I don't know if anyone still remembers this: but back in the CPython 3.9 days, PyPy had over 4x speedup over CPython on the PyPy benchmark suite, now it's 2.8 on their website https://speed.pypy.org/ for 3.11.

Yes CPython is still slow, but it's getting faster :).

Disclaimer: I'm just a volunteer, not an employee of Microsoft, so I don't have a perf report to answer to. This is just my biased opinion.

tom_•1mo ago
As a data point, running a Python program I've been working on lately, which is near enough entirely Python code, with a bit of I/O: (a prototype for some code I'll ultimately be writing in a lower-level language)

(macOS Ventura, x64)

- System python 3.9.6: 26.80s user 0.27s system 99% cpu 27.285 total

- MacPorts python 3.9.25: 23.83s user 0.32s system 98% cpu 24.396 total

- MacPorts python 3.13.11: 15.17s user 0.28s system 98% cpu 15.675 total

- MacPorts python 3.14.2: 15.31s user 0.32s system 98% cpu 15.893 total

Wish I'd thought to try this test sooner now. (I generally haven't bothered with Python upgrades much, on the basis that the best version will be the one that's easiest to install, or, better yet, is there already. I'm quite used to the language and stdlib as the are, and I've just assumed the performance will still be as limited as it always has been...!)

llimllib•1mo ago
I have a benchmark program I use, a solution to day 5 of the 2017 advent of code, which is all python and negligible I/O. It still runs 8.8x faster on pypy than on python 3.14:

    $ hyperfine "mise exec python@pypy3.11 -- python e.py" "mise exec python@3.9 -- python e.py" "mise exec python@3.11 -- python e.py" "mise exec python@3.14 -- python e.py"
    Benchmark 1: mise exec python@pypy3.11 -- python e.py
      Time (mean ± σ):     148.1 ms ±   1.8 ms    [User: 132.3 ms, System: 17.5 ms]
      Range (min … max):   146.7 ms … 154.7 ms    19 runs

    Benchmark 2: mise exec python@3.9 -- python e.py
      Time (mean ± σ):      1.933 s ±  0.007 s    [User: 1.913 s, System: 0.023 s]
      Range (min … max):    1.925 s …  1.948 s    10 runs
     
    Benchmark 3: mise exec python@3.11 -- python e.py
      Time (mean ± σ):      1.375 s ±  0.011 s    [User: 1.356 s, System: 0.022 s]
      Range (min … max):    1.366 s …  1.403 s    10 runs
     
    Benchmark 4: mise exec python@3.14 -- python e.py
      Time (mean ± σ):      1.302 s ±  0.003 s    [User: 1.284 s, System: 0.022 s]
      Range (min … max):    1.298 s …  1.307 s    10 runs
     
    Summary
      mise exec python@pypy3.11 -- python e.py ran
        8.79 ± 0.11 times faster than mise exec python@3.14 -- python e.py
        9.28 ± 0.13 times faster than mise exec python@3.11 -- python e.py
       13.05 ± 0.16 times faster than mise exec python@3.9 -- python e.py
https://gist.github.com/llimllib/0eda0b96f345932dc0abc2432ab...
eru•1mo ago
> [...] and I've just assumed the performance will still be as limited as it always has been...!)

Historically CPython performance has been so bad, that massive speedups were quite possible, once someone seriously got into it.

tom_•1mo ago
And indeed that has proven the case. But my assumption was that Python had been so obviously designed with performance so very much not in mind, that it had ended up in some local minimum from which meaningful escape would be impossible. But I didn't overthink this opinion, and I've always liked Python well enough for small programs anyway, so I don't mind having it proven wrong.
__turbobrew__•1mo ago
How to stay employed for life: create a programming language which is pretty good, but with some fatal flaws (GIL, typing, slow) and you are set for life.
johncolanduoni•1mo ago
I don’t know about calling them fatal, but inculcating a culture that believes the flaws are inescapable laws of reality is probably key.
nurettin•1mo ago
+ It is also blessed by PowerBI and recently, Excel.
kenjin4096•1mo ago
So it seems I was wrong, [[msvc::musttail]] is documented! I will update the blog post to reflect that.

https://news.ycombinator.com/item?id=46385526

bgwalter•1mo ago
MSVC mostly generates slower code than gcc/clang, so maybe this trick reduces the gap.
metaltyphoon•1mo ago
Is this backed by real evidence?
bluecalm•1mo ago
My experience is 10%-15% slower than GCC. That was 10 years ago though.
pjmlp•1mo ago
Imagine how much fast those Windows and XBox games would be if they used gcc/clang. /s
jtrn•1mo ago
Im a bit out of the loop with this, but hope its not like that time with python 3.14, when it was claimed a geometric mean speedup of about 9-15% over the standard interpreter when built with Clang 19. It turned out the results were inflated due to a bug in LLVM 19 that prevented proper "tail duplication" optimization in the baseline interpreter's dispatch loop. Actual gains was aprox 4%.

Edit: Read through it and have come to the conclusion that the post is 100% OK and properly framed: He explicitly says his approach is to "sharing early and making a fool of myself," prioritizing transparency and rapid iteration over ironclad verification upfront.

One could make an argument that he should have cross-compiler checks, independent audits, or delayed announcements until results are bulletproof across all platforms. But given that he is 100% transparent with his thinking and how he works, it's all good in the hood.

kenjin4096•1mo ago
Thanks :), that was indeed my intention. I think the previous 3.14 mistake was actually a good one on hindsight, because if I didn't publicize our work early, I wouldn't have caught the attention of Nelson. Nelson also probably wouldn't have spent one month digging into the Clang 19 bug. This also meant the bug wouldn't have been caught in the betas, and might've been out with the actual release, which would have been way worse. So this was all a happy accident on hindsight that I'm grateful for as it means overall CPython still benefited!

Also this time, I'm pretty confident because there are two perf improvements here: the dispatch logic, and the inlining. MSVC can actually convert switch-case interpreters to threaded code automatically if some conditions are met [1]. However, it does not seem to do that for the current CPython interpreter. In this case, I suspect the CPython interpreter loop is just too complicated to meet those conditions. The key point also that we would be relying on MSVC again to do its magic, but this tail calling approach gives more control to the writers of the C code. The inlining is pretty much impossible to convince MSVC to do except with `__forceinline` or changing things to use macros [2]. However, we don't just mark every function as forceinline in CPython as it might negatively affect other compilers.

[1]: https://github.com/faster-cpython/ideas/issues/183 [2]: https://github.com/python/cpython/issues/121263

jtrn•1mo ago
I wish all self-promoting scientists and sensationalizing journalists had a fraction of the honesty and dedication to actual truth and proper communication of truths as you do. You seem to feel that it’s more important to be transparent about these kinds of technical details than other people are about their claims in clinical medical research. Thank you so much for all you do and the way you communicate about it.

Also, I’m not that familiar with the whole process, but I just wanted to say that I think you were too hard on yourself during the last performance drama. So thank you again and remember not to hold yourself to an impossible standard no one else does.

kenjin4096•1mo ago
Thank you very much for the kind words, that means a lot to me!
halflings•1mo ago
+1, reading through the post, the PR updating the documentation... thanks for being transparent, but also don't be so hard on yourself!

That was a very niche error, that you promptly corrected, no need to be so apologetic about it! And thanks for all the hard work making Python faster!

haberman•1mo ago
I’ll repeat what I said at that time: one of the benefits of the new design is that it’s less vulnerable to the whims of the optimizer: https://news.ycombinator.com/item?id=43322451

If getting the optimal code is relying on getting a pile of heuristics to go in your favor, you’re more vulnerable to the possibility that someday the heuristics will go the other way. Tail duplication is what we want in case, but it’s possible that a future version of the compiler could decide that it’s not desired because of the increased code size.

With the new design, the Python interpreter can express the desired shape of the machine code more directly, leaving it less vulnerable to the whims of the optimizer.

kenjin4096•1mo ago
Yeah, I believe that statement and it seems to hold true for MSVC as well. Thanks for your work inspiring all of this btw!
acemarke•1mo ago
I've never seen this kind of benchmark graph before, and it looks really neat! How was this generated? What tool was used for the benchmarks?

(I actually spent most of Sep/Oct working on optimizing the Immer JS immutable update library, and used a benchmarking tool called `mitata`, so I was doing a lot of this same kind of work: https://github.com/immerjs/immer/pull/1183 . Would love to add some new tools to my repertoire here!)

eesmith•1mo ago
Are you referring to the violin plot? https://en.wikipedia.org/wiki/Violin_plot and in Matplotlib as https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot....

It's in essence a histogram for the distribution, with smoothing, and mirrored on each side.

It looks nice, but is not without well-deserved opposition because 1) the use of smoothing can hide the actual distribution, 2) mirroring contains no extra information, while taking up space, and implying the extra space contains information, and 3) when shown vertically, too often causes people to exclaim it looks like a vulva.

In an HN discussion on the topic, medstrom at https://news.ycombinator.com/item?id=40766519 points to a half-violin plot at https://miro.medium.com/v2/1*J3Q4JKXa9WwJHtNaXRu-kQ.jpeg with the histogram on the left, and the half-violin on the right, which gives you a chance to see side-by-side presentation of the same data.

Tarq0n•1mo ago
Histograms aren't necessarily a true depiction of the distribution. Bin count or width has a large impact on what details get shown.
eesmith•1mo ago
Sure. Very few distributions have lovely square edges, which otherwise indicate some very high frequencies in the distribution, or quantized values.

But that also means we are used to seeing histograms and their bin count and widths in order to estimate possible variances from the true distribution;.

While it's much harder to do the same with violin plots.

eru•1mo ago
You could plot the cumulative distribution function to avoid these problems with histograms.
forrestthewoods•1mo ago
Is there a Clang based build for Windows? I’ve been slowly moving my Windows builds from MSVC to Clang. Which still uses the Microsoft STL implementation.

So far I think using clang instead of MSVC compiler is a strict win? Not a huge difference mind you. But a win nonetheless.

gozzoo•1mo ago
I have quetion - slightly off topic, but related. I was wandering why is pyhton interpreter so much slower than V8 javascript interpreter when both javascript and python are dynamic interpreted languages.
bheadmaster•1mo ago
I can think of two possible reasons:

First is the Google's manpower. Google somehow succeeds in writing fast software. Most Google products I use are fast in contrast to the rest of the ecosystem. It's possible that Google simply did a better job.

The second is CPython legacy. There are faster implementations of Python that completely implement the API (PyPy comes to mind), but there's a huge ecosystem of C extensions written with CPython bindings, which make it virtually impossible to break compatibility. It is possible that this legacy prevents many possible optimizations. On the other hand, V8 only needs to keep compatibility on code-level, which allows them to practically switch out the whole inside in incremental search for a faster version.

I might be wrong, so take what I said with a grain of salt.

canucker2016•1mo ago
Don't forget that there was a Google attempt at making a faster Python - Unladen Swallow. It got lots of PR but never merged with mainline CPython (wikipedia says a dev branch was released).

see https://en.wikipedia.org/wiki/Unladen_Swallow

pansa2•1mo ago
Unladen Swallow got a lot of hype but was only a very small project. IIRC the only people working on it were two interns.

V8 was a much higher priority - Google hired many of the world’s best VM engineers to develop it.

pjmlp•1mo ago
Some of them like Lars Bak, have background up to Self VM, which is a language much more dynamic than Python.

Anything goes regarding changing object shapes, it is one step further than Smalltalk in language plasticity.

everforward•1mo ago
I know of a couple reasons offhand.

JavaScript is JIT’ed where CPython is not. Pypy has JIT and is faster, but I think is incompatible with C extensions.

I think Pythons threading model also adds complexity to optimizing where JavaScripts single thread is easier to optimize.

I would also say there’s generally less impetus to optimize CPython. At least until WASM, JavaScript was sort of stuck with the performance the interpreter had. Python had more off-ramps. You could use pypy for more pure Python stuff, or offload computationally heavy stuff to a C extension.

I think there are some language differences that make JavaScript easier to optimize, but I’m not super qualified to speak on that.

pansa2•1mo ago
> I would also say there’s generally less impetus to optimize CPython

Nonetheless, Microsoft employed a whole "Faster CPython" team for 4 years - they targeted a 5x speedup but could only achieve ~1.5x. Why couldn't they make a significantly faster Python implementation, especially given that PyPy exists and proves it's possible?

everforward•1mo ago
Pypy has much slower C interop than CPython, which I believe is part of the tradeoff. Eg data analysis pipelines are probably still faster in numpy on CPython than pypy.

Not an expert here, but my understanding is that Python is dynamic to the point that optimizing is hard. Like allowing one namespace to modify another; last I used it, the Stackdriver logging adapter for Python would overwrite the stdlib logging library. You import stackdriver, and it changes logging to send logs to stackdriver.

All package level names (functions and variables) are effectively global, mutable variables.

I suspect a dramatically faster Python would involve disabling some of the more unhinged mutability. Eg package functions and variables cannot be mutated, only wrapped into a new variable.

pjmlp•1mo ago
See Smalltalk, Self and Common Lisp, and you will find languages that are even more dynamic than Python, and are in the genesis of high performance JIT research.
igouy•1mo ago
fwiw

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

_ZeD_•1mo ago
keep in mind that, apart from the money throw at js runtime interpreters by google and others, there is also the fact that python - as a language - is way more "dynamic" than javascript.

Even "simple" stuff like field access in python may refer to multiple dynamically-mapped method resolution.

Also, the ffi-bindings of python, while offering a way to extend it with libraries written in c/c++/fortran/... , limit how freely the internals can be changed (see the bug-by-bug compatibility work done for example by pypy, just to name an example, with some constraint that limit some optimizations)

pansa2•1mo ago
> python - as a language - is way more "dynamic" than javascript

Very true, but IMO the existence of PyPy proves that this doesn't necessarily prevent a fast implementation. I think the reason for CPython's poor performance must be your other point:

> the ffi-bindings of python [...] limit how freely the internals can be changed

eru•1mo ago
> Very true, but IMO the existence of PyPy proves that this doesn't necessarily prevent a fast implementation.

PyPy pays for this by having slower C interaction.

nikisweeting•1mo ago
genuinely curious, doesn't JS's proxy objects and prototype-based MRO have a similar performance impact in theory?
cpburns2009•1mo ago
Yeah, I don't see how Python is fundamentally different from JavaScript as far as dynamicism goes. Sure Python has operator overloading, but JavaScript would implement those as regular methods. Pyrhon's init & new aren't any more convoluted than JavaScript's constructors. Python may support multiple inheritance but method and attribute resolution just uses the MRO which is no different than JavaScript's prototype chain.
pjmlp•1mo ago
Urban myths.

Most people that parrot repeat Python dynamism as root cause never used Smalltalk, Self or Common Lisp, or even PyPy for that matter.

pjmlp•1mo ago
See Smalltalk, Self and Common Lisp for highly dynamic languages with good enough JIT, the first two having their research contributed to Hotspot and V8.
dragonwriter•1mo ago
> why is pyhton interpreter so much slower than V8 javascript interpreter when both javascript and python are dynamic interpreted languages.

Because JS’s centrality to the web and V8’s speed’s centrality to Google’s push to avoid other platform owners controlling the web via platform-default browsers meant virtually unlimited resources were spent in optimizing V8 at a time when the JS language itself was basically static; Python has never had the same level of investment and has always spent some of its smaller resources on advancing the language rather than optimizing the implementation.

Also, because the JS legacy that needed to be supported through that is pure JS, whereas with CPython there is also a considerable ecosystem of code that deeply integrates with Python from the outside that must still be supported, and the interface used by that code limits the optimizations that can be applied. Faster Python interpreters exist that don’t support that external ecosystem, but they are less used because that ecosystem is a big part of Python’s value proposition.

IshKebab•1mo ago
Even though Javascript is quite dynamic, Python is much worse. Basically everything involves a runtime look-up. It's pretty much the language you'd design if you were trying to make it as slow as possible.
pjmlp•1mo ago
Just like Smalltalk and Self.

Which can change on the fly anything that is currently executing in the image.

Also after breaking into the debugger, the world can be totally different after resuming execution at the trap location.

Then there are nice primitives like a becomes: b. where all occurrences of a get swapped with b.

int_19h•1mo ago
Python is much, much more dynamic. E.g. look at how something as basic as accessing an attribute on an object works: https://docs.python.org/3/howto/descriptor.html

Also Python has a de facto stable(ish) C ABI for extensions that is 1) heavily used by popular libraries, and 2) makes life more difficult for the JIT because the native code has all the same expressive power wrt Python objects, but JIT can't do code analysis to ensure that it doesn't use it.

Quitschquat•1mo ago
Tbh, 15% faster than slow AF is still slow AF
dingdingdang•1mo ago
Yup, but 5 to 15% faster year on year is real progress and that's ultimately what the big user base of Python are counting on at this point.. and they seem to be getting it! Full disclaimer: I'm not a heavy Python user exactly due to the performance and build/distribution situation - it's just sad from a user-end perspective (I'm not addressing centralised web deployment here but rather decentralised distribution which I ultimately find more "real" and rewarding).
horizion2025•1mo ago
I don't understand this focus on micro performance details... considering that all of this is about an interpretation approach which is always going to be slow relatively speaking. The big speed up would be to JIT it all, then you dont need to care about structuring of switch loops etc
int_19h•1mo ago
You'd be surprised at how little speedup you get from simply JIT-compiling the Python bytecode. It's so high-level that most interesting stuff happens in the layers below anyway.
horizion2025•1mo ago
But if that is so why this focus on the few clock cycles of dispatch?
int_19h•1mo ago
Because it is a fairly easy thing - it's a code transform that's mostly mechanical. And it also improves code quality, unusual for an optimization. So if that nets you those extra few percent, why not?
eab-•1mo ago
My understanding is that also this tail call based interpretation is also kinder to the branch predictor. I wonder if this explains some of the slow downs - they trigger specific cases that cause lots of branch mispredictions.
DrewADesign•1mo ago
After years of admonition discouraging me, I’m using Python for a Windows GUI app over my usual C#/MAUI. I’m much more familiar with Python and the whole VS ecosystem is just so heavy for lightweight tasks. I started with tkinter but found it super clunky for interactions I needed heavily, like on field change, but learning QT seemed like more of a lift than I was interested in. (Maybe a skill issue on both fronts?) Grabbed wxglade and drag-and-dropped an interface with wxpython that only has one external dependency installable with pip, is way more convenient than writing xaml by hand, and ergonomically feels pretty pythonic compared to QT. Glad to see more work going into the windows runtime because I’ll probably be leaning on it more.
halfcat•1mo ago
Wait until you see ImGui bindings for Python [1]. It’s immediate mode instead of retained mode like Tkinter/Qt/Wx. It might not be what you’d want if you’re shipping a thick client to customers, but for internal tooling it’s awesome.

    imgui.text(f"Counter = {counter}")
    if imgui.button("increment counter"):
        counter += 1

    _, name = imgui.input_text("Your name?", name)
    imgui.text(f"Hello {name}!")

[1] https://github.com/pthom/imgui_bundle
DrewADesign•1mo ago
This looks like it would be perfect for the internal user that really just needs to run a shell script with options who’s in the “technical enough to follow instructions faithfully, not technical enough to comfortably/reliably use the command line” demographic.
stinos•1mo ago
ImGui has been on my watchlist for years and recently I finally had an application which seemed I could put it to use. It essentially delivered on all points I hoped it would. After decades in software, it doesn't happen often anymore I'm impressed but now I was.
NetMageSCW•1mo ago
Depending on how important the GUI is to you, I would look into LINQPad for stuff that is scripting but too heavy.
DrewADesign•1mo ago
Looks neat!
dima55•1mo ago
Look at pyfltk also. I haven't used the windows builds, but it's real nice on GNU/Linux.
ktm5j•1mo ago
I really like the Python + Qt/pyside combination. I can whip together a rough GUI using QtCreator and then write the app logic in Python super quickly.
DrewADesign•1mo ago
I’m sure it would be a goto if I made gui apps more regularly, because it’s clearly the more robust solution. So far wxglade is great for a drag-and-drop designer and the code is just enough closer to the regular Python way of doing things that it’s one less thing to learn.
pjmlp•1mo ago
Well, using MAUI instead of Avalonia or Uno was the mistake.
DrewADesign•1mo ago
Yeah I’d have made a more deliberate choice if it took up more of my dev time. I haven’t looked at Uno really though.
DrewADesign•1mo ago
If anyone stumbles upon this in a search for Python UI libaries, also check out Textural. It's a TUI setup that can also be displayed in web browsers because it uses CSS for styling. At least for fairly uncomplicated UIs, it seems pretty simple, and has good event hook support. It has some features like event bubbling, and reactivity, which smack of JS front-end framework workflows on the data side, for better or worse.
vednig•1mo ago
Python's recent developments have been monumental, new versions now easily best the PyPy performance charts on M4 MacBook Air, idk if this has something to do with optimizations by Apple but coming from Linux I was surprised
bboreham•1mo ago
Matt Godbolt was saying recently that using tail-calls for an interpreter suits the branch predictor inside the cpu. Compared to a single big switch / computed jump.
IshKebab•1mo ago
I would have thought it actually helps the branch target predictor rather than the branch predictor. If you assume a simple predictor where the predicted target is just the last taken one then it's going to be wrong almost every time for a single switch. It will only be right for repeats of the exact same instruction.

If you have a separate switch at the end of each instruction then it will be right any time an instruction is followed by the same instruction as last time, which can probably happen quite a lot for short loops.

croemer•1mo ago
2 typos in first sentence. Is this on purpose to make it obviously not-AI generated?

"apology peice" and "tail caling"

wk_end•1mo ago
If you want to make your writing appear non-AI generated, the easiest way is to write it yourself. No typos necessary.

I’m sure with enough cajoling you can make the LLM spit out a technical blog post that isn’t discernibly slop - wanton emoji usage, clichés, self-aggrandizement, relentlessly chipper tone, short “punchy” paragraphs, an absence of depth, “it’s not just X—it’s a completely new Y” - but it must be at least a little tricky what with how often people don’t bother.

[ChatGPT, insert a complaint about how people need to ram LLMs into every discussion no matter how irrelevant here.]

eru•1mo ago
> If you want to make your writing appear non-AI generated, the easiest way is to write it yourself. No typos necessary.

You can ask the AI to make typos for you.

kenjin4096•1mo ago
Woops, thanks for noticing, fixed!
wk_end•1mo ago
So…if the Python team finds tail calls useful, when are we going to see them in Python?
dodomodo•1mo ago
They find them useful as a performance optimization, not as a design tool. This optimization is not relevant to Python code because it relies on the optimization passes the compiler makes.
s0a•1mo ago
still python so it's only beating itself
3r7j6qzi9jvnve•1mo ago
I now have to know why subparsers test got 60% slower... (0.3960 in the graph)
malkia•1mo ago
Wow - clojure's recur in C/C++ - awesome!
amai•1mo ago
Is Python now faster than PHP?

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...