Python performance myths and fairy tales

https://lwn.net/SubscriberLink/1031707/73cb0cf917307a93/

227•todsacerdoti•1d ago

Comments

NeutralForest•1d ago

Cool article, I think a lot of those issues are not Python specific so it's a good overview of whatever others can learn from a now 30 years old language! I think we'll probably go down the JS/TS route where another compiler (Pypy or mypyc or something else) will work alongside CPython but I don't see Python4 happening.

tweakimp•1d ago

I thought we would never see the GIL go away and yet, here we are. Never say never. Maybe Python4 is Python with another compiler.

pjmlp•22h ago

It required Facebook and Microsoft to change the point of view on it, and now the Microsoft team is no more.

So lets see what remains from CPython performance efforts.

ngrilly•13h ago

I’m not sure I understand the reference to JS/TS: TS is only a type checker and has zero effect on runtime performance.

2d8a875f-39a2-4•1d ago

Do you still need an add-on library to use more than one core?

BlackFly•1d ago

Latest version officially supports full-threaded mode: https://docs.python.org/3.14/whatsnew/3.14.html#whatsnew314-...

franktankbank•1d ago

Eh? Multiprocessing has existed since 2.X days.

writebetterc•1d ago

Good job on dispelling the myth of "compiler = fast". I hope SPython will be able to transfer some of its ideas to CPython with time.

nromiun•1d ago

You would think Luajit would have convinced people by now. But most people still think you need a static language and an AOT compiler for performance.

pjmlp•22h ago

Also Smalltalk (Pharo, Squeak, Cincom, Dolphin), Common Lisp (SBCL, Clozure, Allegro, LispWorks), Self,....

But yeah.

mrkeen•23h ago

Where was this dispelled?

quantumspandex•1d ago

So we are paying 99% of the performance just for the 1% of cases where it's nice to code in.

Why do people think it's a good trade-off?

nromiun•1d ago

Because it's nice to code in. Not everything needs to scale or be fast.

Personally I think it is more crazy that you would optimize 99% of the time just to need it for 1% of the time.

BlackFly•22h ago

It isn't an either or choice. The people interested in optimizing the performance are typically different people than those interested in implementing syntactic sugar. It is certainly true that growing the overall codebase risks introducing tensions for some feature sets but that is just a consideration you take when diligently adding to the language.

nomel•13h ago

That’s why Python is the second best language for everything.

The amount of complexity you can code up in a short time, that most everyone can contribute to, is incredible.

dgfitz•1d ago

I can say with certainty I’ve never paid a penny. Have you?

lmm•1d ago

Because computers are more than 100x faster than they were when I started programming, and they were already fast enough back then? (And meanwhile my coding ability isn't any better, if anything it's worse)

jonathrg•1d ago

It's much more than 1%, it is what enables commonly used libraries like pytest and Pydantic.

pjmlp•23h ago

Because many never used Smalltalk, Common Lisp, Self, Dylan,... so they think CPython is the only way there is, plus they already have their computer resources wasted by tons of Electron apps anyway, that they hardly question CPython's performance, or lack thereof.

wiseowise•20h ago

Has it ever crossed your mind that they just like Python?

pjmlp•20h ago

And slow code, yes it has cross my mind.

Usually they also call Python to libraries that are 95% C code.

Fraterkes•20h ago

The hypocrisy gets even worse: the C code then gets compiled to assembly!

pjmlp•5h ago

Except C developers actually acknowledge that, they don't call libraries written in Assembly, C code.

nromiun•4h ago

Sure they do.

https://github.com/OpenMathLib/OpenBLAS https://github.com/FFmpeg/FFmpeg

Plenty of assembly in those projects but no mention of it in the README. Most C projects don't acknowledge the assembly they use.

laichzeit0•5h ago

I use Python 90% of my day and I can't say I like it or hate it or care about it at all. I use it because it has all the libraries I need, and LLMs seem to know it pretty well too. It's a great language for people that don't actually care about programming languages and just want to get stuff done.

Hilift•23h ago

It isn't. There are many things Python isn't up to the task. However, it has been around forever, and some influential niche verticals like cyber security Python was as or more useful than native tooling, and works on multiple platforms.

bluGill•22h ago

Most of the time you are waiting on a human or at least something other than the cpu. Most of the time more time is spent by the programmer writing the code than all the users combined waiting for the program to run.

between those two, most often performance is just fine to trade off.

Krssst•22h ago

Performance is worthless if the code isn't correct. It's easier to write correct code reasonably quickly in Python in simple cases (integers don't overflow like in C, don't wrap around like in C#, no absurd implicit conversions like in other scripting languages).

Also you don't need code to be fast a lot of the time. If you just need some number crunching that is occasionally run by a human, taking a whole second is fine. Pretty good replacement for shell scripting too.

Spivak•20h ago

I mean you can see it with your own experience, folks will post a 50 line snippet of ordinary C code in an blog post which looks like you're reading a long dead ancient language littered with macros and then be like "this is a lot to grok here's the equivalent code in Python / Ruby" and it's 3 lines and completely obvious.

Folks on HN are so weird when it comes to why these languages exist and why people keep writing in them. For all their faults and dynamism and GC and lack of static typing in the real world with real devs you get code that is more correct written faster when you use a higher level language. It's Go's raison d'etre.

pavon•17h ago

But many of the language decisions that make Python so slow don't make code easier to write correctly. Like monkey patching; it is very powerful and can be useful, but it can also create huge maintainability issues, and its existence as a feature hinders making the code faster.

Mawr•20h ago

I don't think anyone aware of this thinks it's a good tradeoff.

The more interesting question is why the tradeoff was made in the first place.

The answer is, it's relatively easy for us to see and understand the impact of these design decisions because we've been able to see their outcomes over the last 20+ years of Python. Hindsight is 20/20.

Remember that Python was released in 1991, before even Java. What we knew about programming back then vs what we know now is very different.

Oh and also, these tradeoffs are very hard to make in general. A design decision that you may think is irrelevant at the time may in fact end up being crucial to performance later on, but by that point the design is set in stone due to backwards compatibility.

nromiun•1d ago

I really hope PyPy gets more popular so that I don't have to argue Python is pretty fast for the nth time.

Even if you have to stick to CPython, Numba, Pythran etc, can give you amazing performance for minimal code changes.

meinersbur•1d ago

Is it just me or does the talk actually confirm all its Python "myths and fairy tales"?

daneel_w•22h ago

It confirms that Python indeed has poor executional performance.

xg15•20h ago

Well, the fairy tale was that Python was fast, or "fast enough" or "fast if we could compile it and get rid of the GIL".

btown•1d ago

I think an important bit of context here is that computers are very, very good at speculative happy-path execution.

The examples in the article seem gloomy: how could a JIT possibly do all the checks to make sure the arguments aren’t funky before adding them together, in a way that’s meaningfully better than just running the interpreter? But in practice, a JIT can create code that does these checks, and modern processors will branch-predict the happy path and effectively run it in parallel with the checks.

JavaScript, too, has complex prototype chains and common use of boxed objects - but v8 has made common use cases extremely fast. I’m excited for the future of Python.

DanielHB•23h ago

The main problem is when the optimizations silently fail because of seemingly innocent changes and suddenly your performance tanked 10x. This is a problem with any language really (CPU cache misses are a thing afterall and many non-dynamic languages have boxed objects) but it is a much, much worse in dynamic languages like Python, JS and Ruby.

Most of the time it doesn't matter, most high-throughput python code just invokes C/C++ where these concerns are not as big of a problem. Most JS code just invokes C/C++ browser DOM objects. As long as the hot-path is not in those languages you are not at such high risk of "innocent change tanked performance"

Even server-side most JS/Python/Ruby code is just simple HTTP stack handlers and invoking databases and shuffling data around. And often large part of the process of handling a request (encoding JSON/XML/etc, parsing HTTP messages, etc) can be written in lower-level languages.

nxobject•22h ago

To be slightly flip, we could say that the Lisp Machine CISC-supports-language full stack design philosophy lives on in how massive M-series reorder buffers and ILP supports JavaScriptCore.

jerf•22h ago

That makes it so that in absolute terms, Python is not as slow as you might naively expect.

But we don't measure programming language performance in absolute terms. We measure them in relative terms, generally against C. And while your Python code is speculating about how this Python object will be unboxed, where its methods are, how to unbox its parameters, what methods will be called on those, etc., compiled code is speculating on actual code the programmer has written, running that in parallel, such that by the time the Python interpreter is done speculating successfully on how some method call will resolve with actual objects the compiled code language is now done with ~50 lines of code of similar grammatical complexity. (Which is a sloppy term, since this is a bit of a sloppy conversation, but consider a series "p.x = y"-level statements in Python versus C as the case I'm looking at here.)

There's no way around it. You can spend your amazingly capable speculative parallel CPU on churning through Python interpretation or you can spend it on doing real work, but you can't do both.

After all, the interpreter is just C code too. It's not like it gets access to special speculation opcodes that no other program does.

Demiurge•19h ago

I love this “real work”. Real work, like writing linked lists, array bounds checking, all the error handling for opening files, etc, etc? There is a reason Python and C both have a use case, and it’s obvious Python will never be as fast as C doing “1 + 1”. The real “real work” is in getting stuff done, not just making sure the least amount of cpu cycles are used to accomplish some web form generation.

Anyway, I think you’re totally right, in your general message. Python will never be the fastest language in all contexts. Still, there is a lot of room for optimization, and given it’s a popular language, it’s worth the effort.

btown•18h ago

To put it another way, I choose Python because of its semantics around dynamic operator definition, duck typing etc.

Just because I don’t write the bounds-checking and type-checking and dynamic-dispatch and error-handling code myself, doesn’t make it any less a conscious decision I made by choosing Python. It’s all “real work.”

kragen•18h ago

Type checking and bounds checking aren't "real work" in the sense that, when somebody checks their bank account balance on your website or applies a sound effect to an audio track in their digital audio workstation, they don't think, "Oh good! The computer is going to do some type checking for me now!" Type checking and bounds checking may be good means to an end, but they are not the end, from the point of view of the outside world.

Of course, the bank account is only a means to the end of paying the dentist for installing crowns on your teeth and whatnot, and the sound effect is only a means to the end of making your music sound less like Daft Punk or something, so it's kind of fuzzy. It depends on what people are thinking about achieving. As programmers, because we know the experience of late nights debugging when our array bounds overflow, we think of bounds checking and type checking as ends in themselves.

But only up to a point! Often, type checking and bounds checking can be done at compile time, which is more efficient. When we do that, as long as it works correctly, we never† feel disappointed that our program isn't doing run-time type checks. We never look at our running programs and say, "This program would be better if it did more of its type checks at runtime!"

No. Run-time type checking is purely a deadweight loss: wasting some of the CPU on computation that doesn't move the program toward achieving the goals we were trying to achieve when we wrote it. It may be a worthwhile tradeoff (for simplicity of implementation, for example) but we must weigh it on the debit side of the ledger, not the credit side.

______

† Well, unless we're trying to debug a PyPy type-specialization bug or something. Then we might work hard to construct a program that forces PyPy to do more type-checking at runtime, and type checking does become an end.

rightbyte•15h ago

> and the sound effect is only a means to the end of making your music sound less like Daft Punk or something

What do you mean. Daft Punk is not daft punk. Why single them out :)

kragen•15h ago

Well, originally I wrote "more like Daft Punk", but then I thought someone might think I was stereotyping musicians as being unoriginal and derivative, so I swung the other way.

jerf•18h ago

I can't figure out what your first paragraph is about. The topic under discussion is Python performance. We do not generally try to measure something as fuzzy as "real work" as you seem to be using the term in performance discussions because what even is that. There's a reason my post referenced "lines of code", still a rather fuzzy thing (which I already pointed out in my post), but it gets across the idea that while Python has to do a lot of work for "x.y = z" for all the things that "x.y" might mean including the possibility that the user has changed what it means since the last time this statement ran, compiled languages generally do over an order of magnitude less "work" in resolving that.

This is one of the issues with Python I've pointed out before, to the point I suggest that someone could make a language around this idea: https://jerf.org/iri/post/2025/programming_language_ideas/#s... In Python you pay and pay and pay and pay and pay for all this dynamic functionality, but in practice you aren't actually dynamically modifying class hierarchies and attaching arbitrary attributes to arbitrary instances with arbitrary types. You pay for the feature but you benefit from them far less often than the number of times Python is paying for them. Python spends rather a lot of time spinning its wheels double-checking that it's still safe to do the thing it thinks it can do, and it's hard to remove that even in JIT because it is extremely difficult to prove it can eliminate those checks.

mmcnl•16h ago

You claimed churning through Python interpretation is not "real work". You now correctly ask the question: what is "real work"? Why is interpreting Python not real work, if it means I don't have to check for array bounds?

coldtea•11h ago

>Why is interpreting Python not real work, if it means I don't have to check for array bounds?

Because other languages can do that for you too, much much faster...

Demiurge•16h ago

I understand what you're saying. In a way, my comment is actually off-topic to most of your comment. What I was saying in my first paragraph is that the words you use in your context of a language runtime in-effeciency, can be used to describe why these in-effeciences exist, in the context of higher level processes, like business effeciency. I find your choice of words amusing, given the juxtoposition of these contexts, even saying "you pay, pay, pay".

Calavar•17h ago

I believe they are talking about the processor doing real work, not the programmer.

Demiurge•16h ago

Yeah, I get it, but I found the choice of words funny, because these words can apply in the larger context. It's like saying, Python transfers work from your man hours to cpu hours :)

dragonwriter•18h ago

> After all, the interpreter is just C code too.

What interpreter? We’re talking about JITting Python to native code.

qaq•18h ago

Welp there is Mojo so looks like soon you will not really need to care that much. Prob will get better performance than C too.

jerf•18h ago

I've been hearing promises about "better than C" performance from Python for over 25 years. I remember them on comp.lang.python, back on that Usenet thing most people reading this have only heard about.

At this point, you just shouldn't be making that promise. Decent chance that promise is already older than you are. Just let the performance be what it is, and if you need better performance today, be aware that there are a wide variety of languages of all shapes and sizes standing by to give you ~25-50x better single threaded performance and even more on multi-core performance today if you need it. If you need it, waiting for Python to provide it is not a sensible bet.

hnfong•16h ago

You're probably right, Mojo seems to be more "python-like" than actually source-compatible with python. Bunch of features notably classes are missing.

qaq•12h ago

Give em a bit of time it's pretty young lang

qaq•12h ago

I am a bit older than Python :). I imagine creator of clang and LLVM has fairly good grasp on making things performant. Think of Mojo as Rust with better ergonomics and more advanced compiler that you can mix and match with regular python.

sevensor•11h ago

I maintain a program written in Python that is faster than the program written in C that it replaces. The C version can do a lot more operations, but it amounts to enumerating 2^N alternatives when you could enumerate N alternatives instead.

Certainly my version would be even faster if I implemented it in C, but the gains of going from exponential to linear completely dominate the language difference.

vhantz•4h ago

Yeah let's just compare apple to oranges

patmorgan23•2h ago

So you're saying two different programs implementing two different algorithms perform differently and that lets you draw a conclusion about how the underlying language/compliers/interpreters behave?

Have you ever heard of a controlled variable?

lenkite•17h ago

Mojo feels less like a real programming language for humans and primarily a language for AI's. The docs for the language immediately dive into chatbots and AI prompts.

qaq•12h ago

I mean thats the use case they care about for obvious reasons but it's not the only use case

CraigJPerry•17h ago

> And while your Python code is speculating about how this Python object will be unboxed

This is wrong i think? The GP is talking about JIT'd code.

fpoling•21h ago

Although JS supports prototype mutations, the with operator and other constructs that make optimization harder, typical JS code does not use that. Thus JIT can add few checks for presence of problematic constructions to direct it to a slow path while optimizing not particularly big set of common patterns. And then the JS JIT does not need to care much about calling arbitrary native code as the browser internals can be adjusted/refactored to tune to JIT needs.

With Python that does not work. There are simply more optimization-unfriendly constructs and popular libraries use those. And Python calls arbitrary C libraries with fixed ABI.

So optimizing Python is inherently more difficult.

josefx•20h ago

> but v8 has made common use cases extremely fast. I’m excited for the future of Python.

Isn't v8 still entirely single threaded with limited message passing? Python just went through a lot of work to make multithreaded code faster, it would be disappointing if it had to scrap threading entirely and fall back to multiprocessing on shared memory in order to match v8.

zozbot234•20h ago

Multithreaded code is usually bottlenecked by memory bandwidth, even more so than raw compute. C/C++/Rust are great at making efficient use of memory bandwidth, whereas scripting languages are rather wasteful of it by comparison. So I'm not sure that multithreading will do much to bridge the performance gap between binary compiled languages and scripting languages like Python.

loeg•20h ago

JS is single-threaded. Python isn't.

mcdeltat•20h ago

I wonder if branch prediction can still hide the performance loss when the happy path checks become large/complex. Branch prediction is a very low level optimisation. And if the predictor is right you don't get everything for free. The CPU must still evaluate the condition, which takes resources, albeit it's no longer on the critical path. However I'd think the CPU would stall if it got too far ahead of the condition execution (ultimately all the code must execute before the program completes). Perhaps given the nature of Python, the checks would be so complex that in a tight loop they'd exert significant resource pressure?

TheOtherHobbes•1h ago

There's an obvious answer - run everything on GPUs. Each speculative branch runs in parallel on its own core, and you add a layer of super-fast branch switching when a branch runs into a problem.

Given the current state of computing, I am unable to state definitively if this suggestion is satire.

mrkeen•1d ago

I didn't read with 100% focus, but this lwn account of the talk seemed to confirm those myths instead of debunking.

diegocg•23h ago

Yep, for me it confirms all the reasons why I think python is slow and not a good language for anything that goes beyond a script. I work with it everyday, and I have learned that I can't even trust tooling such as mypy because it's full of corner cases - turns out that not having a clear type design in a language is not something that can be fundamentally fixed by external tools. Tests are the only thing that can make me trust code written in this language

jdhwosnhw•21h ago

> Yep, for me it confirms all the reasons why I think python is slow

Yes, that is literally the explicit point of the talk. The first myth of the article was “python is not slow“

postexitus•23h ago

A more careful reading of the article is required.

The first myth is "Python is not slow" - it is debunked, it is slow.

The second myth is ""it's just a glue language / you just need to rewrite the hot parts in C/C++" - it is debunked, just rewriting stuff in C/Rust does not help.

The third myth is " Python is slow because it is interpreted" - it is debunked, it is not slow only because it is interpreted.

mrkeen•19h ago

Thanks! As a Python outsider, I was primed for a Python insider to be trying to change my views, not confirm them, and I did indeed misread.

zahlman•18h ago

My impression is that GvR conceded a long time ago that Python is slow, and doesn't particularly care (and considers it trolling to keep bringing it up). The point is that in the real world this doesn't matter a lot of the time, at least as long as you aren't making big-O mistakes — and easier-to-use languages make it easier to avoid those mistakes.

For that matter, I recently saw a talk in the Python world that was about convincing people to let their computer do more work locally in general, because computers really are just that fast now.

ActorNightly•19h ago

>just rewriting stuff in C/Rust does not help.

Except it does. The key is to figure out which part you actually need to go fast, and write it in C. If most of your use case is dominated by network latency.

Overall, people seem to miss the point of Python. The best way to develop software is "make it work, make it good, make it fast" - the first part gets you to an end to end prototype that gives you a testable environment, the second part establishes the robustness and consistency, and the third part lets you focus on optimizing the performance with a robust framework that lets you ensure that your changes are not breaking anything.

Pythons focus is on the first part. The idea is that you spend less time making it work. Once you have it working, then its much easier to do the second part (adding tests, type checking, whatever else), and then the third part. Now with LLMs, its actually pretty straightforward to take a python file and translate it to .c/.h files, especially with agents that do additional "thinking" loops.

However, even given all of that, in practice you often don't need to move away from Python. For example, I have a project that datamines Strava Heatmaps (i.e I download png tiles for entire US). The amount of time that it took me to write it in Python in addition to running it (which takes about a day) is much shorter than it would have taken me to write it in C++/Rust and then run it with speedup in processing.

IshKebab•15h ago

In fairness I wouldn't really call those "myths", just bad defences of Python's slowness. I don't think the people saying them really believe it - if it came to life or death. They just really like Python and are trying to avoid the cognitive dissonance of liking a really slow language.

Like, I wouldn't say it's a "myth" that Linux is easy to use.

akkad33•15h ago

> The first myth is "Python is not slow" - it is debunked, it is slow

This is strange. Most people in programming community know python is slow. If it has any reputation, it's that it is quite slow

pjmlp•23h ago

Basically, leave Python for OS and application scripting tasks, and as BASIC replacement for those learning to program.

aragilar•22h ago

And yet, most of what people end up doing ends up being effectively OS and application scripting. Most ML projects are really just setting up a pipeline and telling the computer to go and run it. Cloud deployments are "take this yaml and transform it some other yaml". In as much as I don't want to use Fortran to parse a yaml file, I don't really want to write an OS (or a database) in Python. Even something like django is mostly deferring off tasks to faster systems, and is really about being a DSL-as-programming-language while still being able to call out to other things (e.g. ML code).

pjmlp•21h ago

I would rather use Fortran actually, not all of us are stuck with Fortran 77.

Ironically Fortran support is one of the reasons CUDA won over OpenCL.

Having said that, plenty of programming languages with JIT/AOT toolchains have nice YAML parsers, I don't see the need to bother with Python for that.

Ulti•23h ago

Feel like Mojo is worth a shoutout in this context https://www.modular.com/mojo Solves the issue of having a superset of Python in syntax where "fn" instead of "def" functions are assumed static typed and compilable with Numba style optimisations.

_aavaa_•23h ago

Mojo NOT being open-source is a complete non-starter.

alankarmisra•23h ago

Genuinely curious; while I understand why we would want a language to be open-source (there's plenty of good reasons), do you have anecdotes where the open-sourceness helped you solve a problem?

yupyupyups•22h ago

Not the OP, but I have needed to patch Qt due to bugs that couldn't be easily worked around.

I have also been frustrated while trying to interoperate with expensive proprietary software because documentation was lacking, and the source code was unavailable.

In one instance, a proprietary software had the source code "exposed", which helped me work around its bugs and use it properly (also poorly documented).

There are of course other advantages of having that transparancy, like being able to independently audit the code for vulnerabilities or unacceptable "features", and fix those.

Open source is oftentimes a prerequisite for us to be able to control our software.

_aavaa_•21h ago

It has helped prevent problems. I am not worried about a python suddenly adding a clause stating that I can’t release a ML framework…

Philpax•18h ago

In the earlier days of rustc, it was handy to be able to look at the context for a specific compiler error (this is before the error reporting it is now known for). Using that, I was able to diagnose what was wrong with my code and adjust it accordingly.

Ulti•21h ago

More of a question of /will/ Mojo eventually be entirely open source, chunks of it already are. The intent from Modular is eventually it will be, just not everything all at once and not whilst they're internally doing loads of dev for their own commercial entity. Which seems fair enough to me. Importantly they have open sourced lots of the stdlib which is probably what anyone external would contribute to or want to change anyway? https://www.modular.com/blog/the-next-big-step-in-mojo-open-...

_aavaa_•21h ago

When it has become open source I will consider building up expertise and a product on it. Until it has happened there are no guarantees that it will.

Ulti•21h ago

Well the "expertise" is mostly just Python thats sort of the value prop. But yeah building an actual AI product ontop I'd be more worried about the early stage nature of Modular rather than the implementation is closed source.

_aavaa_•21h ago

Sure, that’s the value prop of numba too. But reality is different.

abhijeetpbodas•23h ago

An earlier version of the talk is at https://www.youtube.com/watch?v=ir5ShHRi5lw (I could not find the EuroPython one).

fragebogen•21h ago

Here's a newer one https://www.youtube.com/watch?v=1uFMW0IcZuw

ic_fly2•23h ago

It’s a good article on speed.

But honestly the thing that makes any of my programs slow is network calls. And there a nice async setup goes a long way. And then k8 for the scaling.

nicolaslem•22h ago

This. I maintain an ecommerce platform written in Python. Even with Python being slow, less than 30% of our request time is spent executing code, the rest is talking to stuff over the network.

stackskipton•21h ago

SRE here, that horizontal scaling with Python has impacts as it’s more connections to database and so forth so you are impacting things even if you don’t see it.

ic_fly2•18h ago

Meh, even with basic async I’ve been able to overload azure’s premium ampq offering memory capacity.

But yes managing db connections is a pain. But I don’t think it’s any better in Java (my only other reference at this scale)

wussboy•7h ago

That’s why we use Firestore

gen220•18h ago

I think articles like this cast too wide a net when they say "performance" or "<language> is fast/slow".

A bunch of SREs discussing which languages/servers/runtimes are fast/slow/efficient in comparable production setups would give more practical guidance.

If you're building an http daemon in a traditional three-tiered app (like a large % of people on HN), IME, Python has quietly become a great language in that space, compared to its peers, over the last 8 years.

ntoll•23h ago

Antonio is a star. He's also a very talented artist.

dgan•22h ago

"Rewrite the hot path in C/C++" is also a landmine because how inefficient the boundary crossing is. so you really need "dispatch as much as possible at once" instead of continuously calling the native code

aragilar•22h ago

Isn't this just a specific example of the general rule of pulling out repeated use of the same operation in a loop? I'm not sure calls out to C are specifically slow in CPython (given many operations are really just calling C underneath).

KeplerBoy•22h ago

The key is to move the entire loop to a compiled language instead of just the inner operation.

dgan•22h ago

they are specifically slow. there was a project which measured FFI cost in different languages, and python is awfully bad

Twirrim•21h ago

The serialisation cost of translating data representations between python and C (or whatever compiled language you're using) is notable. Instead of having the compiled code sit in the centre of a hot loop, it's significantly better to have the loop in the compiled code and call it once

https://pythonspeed.com/articles/python-extension-performanc...

morkalork•20h ago

The overhead of copying and moving data around in Python is frustrating. When you are CPU bound on a task, you can't use threads (which do have shared memory) because of the GIL, so you end up using whole processes and then waste a bunch of cycles communicating stuff back and forth. And yes, you can create shared memory buffers between Python processes but that is nowhere near as smooth as say two Java threads working off a shared data structure that's got synchronized sprinkled on it.

kragen•18h ago

You don't have to serialize data or translate data representations between CPython and C. That article is wrong. What's slow in their example is storing data (such as integers) the way CPython likes to store it, not translating that form to a form easily manipulated in C, such as a native integer in a register. That's just a single MOV instruction, once you get past all the type checking and reference counting.

You can avoid that problem to some extent by implementing your own data container as part of your C extension (the article's solution #1); frobbing that from a Python loop can still be significantly faster than allocating and deallocating boxed integers all the time, with dynamic dispatch and reference counting. But, yes, to really get reasonable performance you want to not be running bytecodes in the Python interpreter loop at all (the article's solution #2).

But that's not because of serialization or other kinds of data format translation.

ActorNightly•19h ago

>how inefficient the boundary crossing is

For 99.99% of the programs that people write, the modern M.2 NVME hard drives are plenty fast, and thats the laziest way to load data into a C extension or process.

Then there is unix pipes which are sufficiently fast.

Then there is shared memory, which basically involves no loading.

As with Python, all depends on the setup.

zahlman•18h ago

The problem isn't loading the data, but marshalling it (i.e, transforming it into a data structure that makes sense for the faster language to operate on, and back again). Or if you don't transform (or the data is special-cased enough that no transformation makes sense) then the available optimizations become much more limited.

ActorNightly•18h ago

Thats all just design. Nothing having to do with particular language.

jononor•14h ago

There are several datastructures for numeric data that do not need marshalling, and are suitable for very efficient interoperetion between Python and C/C++/Rust etc. Examples include array.array (in standard library), numpy.array, and PyArrow.

didip•18h ago

These days it's "rewrite in Rust".

Typically Python is just the entry and exit point (with a little bit of massaging), right?

And then the overwhelming majority of the business logic is done in Rust/C++/Fortran, no?

01HNNWZ0MV43FF•17h ago

With computer vision you end up wanting to read and write to huge buffers that aren't practical to serialize and are difficult to share. And even allocating and freeing multi-megabyte framebuffers at 60 FPS can put a little strain on the allocator, so you want to reuse them, which means you have to think about memory safety.

That is probably why his demo was Sobel edge detection with Numpy. Sobel can run fast enough at standard resolution on a CPU, but once that huge buffer needs to be read or written outside of your fast language, things will get tricky.

This also comes up in Tauri, since you have to bridge between Rust and JS. I'm not sure if Electron apps have the same problem or not.

aeroevan•16h ago

In the data science/engineering world apache arrow is the bridge between languages, so you don't actually need to serialize into language specific structures which is really nice

jononor•13h ago

The "numpy" Sobel code is not that good, unfortunately - all the iteration is done in Python, so there is not much benefit from involving numpy. If one would use say scipy.convolve2d on a numpy.array, it would be much faster.

pavon•17h ago

One use of Python as a "glue language" I've seen that actually avoids the performance problems of those bindings is GNU Radio. That is because its architecture basically uses python as a config language that sets up the computation flow-graph at startup, and then the rest of runtime is entirely in compiled code (generally C++). Obviously that approach isn't applicable to all problems, but it really shaped my opinion of when/how a slow glue language is acceptable.

slt2021•15h ago

This. Use python only for control flow, and offload data flow to a library that is better suited for this: written in C, uses packed structs, cache friendly, etc.

if you want multiprocessing, use the multiprocessing library, scatter and gather type computation, etc

IshKebab•17h ago

And it's not just inefficiency. Even with fancy FFI generators like PyO3 or SWIG, adding FFI adds a ton of work, complexity, makes debugging harder, distribution harder, etc.

In my opinion in most cases where you might want to write a project in two languages with FFI, it's usually better not to and just use one language even if that language isn't optimal. In this case, just write the whole thing in C++ (or Rust).

There are some exceptions but generally FFI is a huge cost and Python doesn't bring enough to the table to justify its use if you are already using C++.

robmccoll•22h ago

Python as a language will likely never have a "fast" implementation and still be Python. It is way too dynamic to be predictable from the code alone or even an execution stream in a way that allows you to simplify the actual code that will be executed at runtime either through AOC or JIT. The language is itself is also quite large in terms of syntax and built-in capability at this point which makes new feature-conplete implementations that don't make major trade offs quite challenging. Given how capable LLMs are at translating code, it seems like the perfect time to build a language with similar syntax, but better scoped behavior, stricter rules around typing, and tooling to make porting code and libraries automated and relatively painless. What would existing candidates be and why won't they work as a replacement?

pjmlp•22h ago

Self and Smalltalk enter the room.

As for the language with similar syntax, do you want Nim, Mojo or Scala 3?

BlackFly•22h ago

The secret as stated is the comlexity of a JIT. In practice, that dynamism just isn't used much in practice and in particular in optimization targets. The JIT analyses the code paths, sees that no writes to the target are possible so treats it as a constant.

Java has similar levels of dynamism-with invokedynamic especially, but already with dynamic dispatch-in practice the JIT monomorphises to a single class even though by default classes default to non-final in Java and there may even be multiple implementations known to the JVM when it monomorphises. Such is the strength of the knowledge that a JIT has compared to a local compiler.

pjmlp•20h ago

Yes, Java syntax might look like C++, but the execution semantics are closer to Objective-C and Smalltalk, which is why adopting StrongTalk JIT for Java Hotspot was such a win.

acmj•22h ago

Pypy is 10x faster and is compatible with most cpython code. IMHO it was a big mistake not to adopt JIT during the 2-to-3 transition.

cestith•20h ago

That “most” is doing a big lift there. At some point you might consider that you’re actually programming in the language of Pypy and not pure Python. It’s effectively a dialect of the language like Turbo Pascal vs ISO Pascal or RPerl instead of Perl.

cma•19h ago

Most is more CPython code than python 3 was compatible with. But the port of the broken code was likely much easier than if it had moved to a JIT at the same time too.

rirze•17h ago

Isn't there an incoming JIT in 3.14?

nu11ptr•22h ago

The primary focus here is good and something I hadn't considered: python memory being so dynamic leads to poor cache locality. Makes sense. I will leave that to others to dig into.

That aside, I was expecting some level of a pedantic argument, and wasn't disappointed by this one:

"A compiler for C/C++/Rust could turn that kind of expression into three operations: load the value of x, multiply it by two, and then store the result. In Python, however, there is a long list of operations that have to be performed, starting with finding the type of p, calling its __getattribute__() method, through unboxing p.x and 2, to finally boxing the result, which requires memory allocation. None of that is dependent on whether Python is interpreted or not, those steps are required based on the language semantics."

The problem with this argument is the user isn't trying to do these things, they are trying to do multiplication, so the fact that the lang. has to do all things things in the end DOES mean it is slow. Why? Because if these things weren't done, the end result could still be achieved. They are pure overhead, for no value in this situation. Iow, if Python had a sufficiently intelligent compiler/JIT, these things could be optimized away (in this use case, but certainly not all). The argument is akin to: "Python isn't slow, it is just doing a lot of work". That might be true, but you can't leave it there. You have to ask if this work has value, and in this case, it does not.

By the same argument, someone could say that any interpreted language that is highly optimized is "fast" because the interpreter itself is optimized. But again, this is the wrong way to think about this. You always have to start by asking "What is the user trying to do? And (in comparison to what is considered a fast language) is it fast to compute?". If the answer is "no", then the language isn't fast, even if it meets the expected objectives. Playing games with things like this is why users get confused on "fast" vs "slow" languages. Slow isn't inherently "bad", but call a spade a spade. In this case, I would say the proper way to talk about this is to say: "It has a fast interpreter". The last word tells any developer with sufficient experience what they need to know (since they understand statically compiled/JIT and interpreted languages are in different speed classes and shouldn't be directly compared for execution speed).

andylei•21h ago

The previous paragraph is

> Another "myth" is that Python is slow because it is interpreted; again, there is some truth to that, but interpretation is only a small part of what makes Python slow.

He concedes its slow, he's just saying it's not related to how interpreted it is.

nu11ptr•21h ago

I would argue this isn't true. It is a big part of what makes it slow. The fastest interpreted languages are one to two orders of magnitude slower than for example C/C++/Rust. If your language does math 20-100 times slower than C, it isn't fast from a user perspective. Full stop. It might, however, have a "fast interpreter". Remember, the user doesn't care if it is a fast for an interpreted language, they are just trying to obtain their objective (aka do math as fast as possible). They can get cache locality perfect, and Python would still be very slow (from a math/computation perspective).

nyrikki•19h ago

The 200-100 times slower is a bit cherry picked, but use case does matter.

Typically from a user perspective, the initial starting time is either manageable or imperceptible in the cases of long running services, although there are other costs.

If you look at examples that make the above claim, they are almost always tiny toy programs where the cost of producing byte/machine code isn't easily amortized.

This quote from the post is an oversimplification too:

> But the program will then run into Amdahl's law, which says that the improvement for optimizing one part of the code is limited by the time spent in the now-optimized code

I am a huge fan of Amdahl's law, but also realize it is pessimistic and most realistic with parallelization.

It runs into serious issues when you are multiprocessing vs parallel processing due to preemption, etc .

Yes you still have the costs of abstractions etc...but in today's world, zero pages on AMD, 16k pages and a large number of mapped registers on arm, barrel shifters etc... make that much more complicated especially with C being forced into trampolines etc...

If you actually trace the CPU operations, the actual operations for 'math' are very similar.

That said modern compilers are a true wonder.

Interpreted language are often all that is necessary and sufficient. Especially when you have Internet, database and other aspects of the system that also restrict the benefits of the speedups due to...Amdahl's law.

nu11ptr•19h ago

I'm not so much cherry picking as I am specifically talking compute (not I/O,stdlib) performance. However, when measured for general purpose tasks, that would involve compute and things like I/O, stdlib performance, etc., Python on the whole is typically NOT 20-100x times slower for a given task. Its I/O layer is written in C like many other languages, so the moment you are waiting on I/O you have leveled the playing field. Likewise, Python has a very fast dict implementation in C, so when doing heavy map work, you also amortorize the time between the (brutally slow) compute and the very fast maps.

In summary, it depends. I am talking about compute performance, not I/O or general purpose task benchmarking. Yes, if you have a mix of compute and I/O (which admittedly is a typical use case), it isn't going to be 20-100x slower, but more likely "only" 3-20x slower. If it is nearly 100% I/O bound, it might not be any slower at all (or even faster if properly buffered). If you are doing number crunching (w/o a C lib like NumPy), your program will likely be 40-100x slower than doing it in C, and many of these aren't toy programs.

nyrikki•17h ago

Even with compute performance it is probably closer than you expect.

Python isn't evaluated line-by-line, even in micropython, which is about the only common implementation that doesn't work in the same way.

Cython VM will produce an AST of opcodes, and binary operations just end up popping off a stack, or you can hit like pypy.

How efficiently you can keep the pipeline fed is more critical than computation costs.

     int a = 5;
     int b = 10;
     int sum = a + b;

Is compiled to:

     MOV EAX, 5
     MOV EBX, 10
     ADD EAX, EBX
     MOV [sum_variable]

In the PVM binary operations remove the top of the stack (TOS) and the second top-most stack item (TOS1) from the stack. They perform the operation, and put the result back on the stack.

That pop, pop isn't much more expensive on modern CPUs and some C compilers will use a stack depending on many factors. And even in C you have to use structs of arrays etc... depending on the use case. Stalled pipelines and fetching due to the costs is the huge difference.

It is the setup costs, GC, GIL etc... that makes python slower in many cases.

While I am not suggesting it is as slow as python, Java is also byte code, and often it's assumptions and design decisions are even better or at least nearly equal to C in the general case unless you highly optimize.

But the actual equivalent computations are almost identical, optimizations that the compilers make differ.

andylei•16h ago

i'll answer your argument with the initial paragraph you quoted:

> A compiler for C/C++/Rust could turn that kind of expression into three operations: load the value of x, multiply it by two, and then store the result. In Python, however, there is a long list of operations that have to be performed, starting with finding the type of p, calling its __getattribute__() method, through unboxing p.x and 2, to finally boxing the result, which requires memory allocation. None of that is dependent on whether Python is interpreted or not, those steps are required based on the language semantics.

immibis•15h ago

Typically a dynamic language JIT handles this by observing what actual types the operation acts on, then hardcoding fast paths for the one type that's actually used (in most cases) or a few different types. When the type is different each time, it has to actually do the lookup each time - but that's very rare.

i.e.

if(a->type != int_type || b->type != int_type) abort_to_interpreter();

result = ((intval*)a)->val + ((intval*)b)->val;

The CPU does have to execute both lines, but it does them in parallel so it's not as bad as you'd expect. Unless you abort to the interpreter, of course.

ActivePattern•21h ago

A “sufficiently smart compiler” can’t legally skip Python’s semantics.

In Python, p.x * 2 means dynamic lookup, possible descriptors, big-int overflow checks, etc. A compiler can drop that only if it proves they don’t matter or speculates and adds guards—which is still overhead. That’s why Python is slower on scalar hot loops: not because it’s interpreted, but because its dynamic contract must be honored.

pjmlp•20h ago

In Smalltalk, p x * 2 has that flow that as well, and even worse, lets assume the value returned by p x message selector, does not understand the * message, thus it will break into the debugger, then the developer will add the * message to the object via the code browser, hit save, and exit the debugger with redo, thus ending the execution with success.

Somehow Smalltalk JIT compilers handle it without major issues.

ActivePattern•20h ago

Smalltalk JITs make p x * 2 fast by speculating on types and inserting guards, not by skipping semantics. Python JITs do the same (e.g. PyPy), but Python’s dynamic features (like __getattribute__, unbounded ints, C-API hooks) make that harder and costlier to optimize away.

You get real speed in Python by narrowing the semantics (e.g. via NumPy, Numba, or Cython) not by hoping the compiler outsmarts the language.

pjmlp•20h ago

People keep forgetting about image based semantics development, debugger, meta-classes, messages like becomes:,...

There is to say everything dynamic that can be used as Python excuse, Smalltalk and Self, have it, and double up.

tekknolagi•19h ago

If I may toot my own horn: https://bernsteinbear.com/blog/typed-python/

afiori•10h ago

Python'a JIT could do the same, it could check if __getattribute__() is the default implementation and replace its call with p x directly. This would work only for classes that have not been modified at runtime and that do not implement a custom __getattribute__

cma•20h ago

edit and continue is available on lots of JIT-runtime languages

nu11ptr•20h ago

First, we need to add the word 'only': "not ONLY because it’s interpreted, but because its dynamic contract must be honored." Interpreted languages are slow by design. This isn't bad, it just is a fact.

Second, at most this describes WHY it is slow, not that it isn't, which is my point. Python is slow. Very slow (esp. for computation heavy workloads). And that is okay, because it does what it needs to do.

rstuart4133•13h ago

> The problem with this argument is the user isn't trying to do these things,

I'd argue differently. I'd say the the problem isn't that the user is doing those things, it's that the language doesn't know what he's trying to do.

Python's explicit goal was always ergonomics, and it was always ergonomics over speed or annoying compile time error messages. "Just run the code as written dammit" was always the goal. I remember when the never class model was introduced, necessitating the introduction of __get_attribute__. My first reaction as a C programmer was "gee you took a speed hit there". A later reaction was to use it to twist the new system into something it's inventors possibly never thought of. It was a LR(1) parser, that let you write the grammars as regular Python statements.

While they may not have thought abusing the language in that particular way, I'm sure the explicit goal was to create a framework that any idea to be expressed with minimal code. Others also used to hooks they provided into the way the language builds to create things like pydantic and spyne. Spyne for example lets you express the on-the-wire serialisation formats used by RPC as Python class declarations, and then compile them into JSON, xml, SOAP of whatever. Sqlalchamey lets you express SQL using Python syntax, although in a more straightforward way.

All of them are very clever in how they twist the language. Inside those frameworks, "a = b + c" does not mean "add b to c, and place the result in a". In the LR(1) parser for example it means "there is a production called 'a', that is a 'b' followed by a 'c'". 'a' in that formulation holds references to 'b' and 'c'. Later the LR(1) parser will consume that, compiling it into something very different. The result is a long way from two's compliment addition.

It is possible to use a power powerful type systems in a similar way. For example I've seen FPGA designs expressed in Scalar. However, because Scalar's type system insists on knowing what is going on at compile time, Scalar had a fair idea of what the programmer is building. The compile result isn't going to be much slower than any other code. Python achieved the same flexibility by abandoning type checking at compile time almost entirely, pushing it all to run time. Thus the compiler has no idea of what going to executed in the end (the + operation in the LR parser only gets executed once for example), which is what I said above "it's that the language doesn't know what the programmer is trying to do".

You argue that since it's an interpreted language, it's the interpreters jobs to figure out what the programmer is trying to do at run time. Surely it can figure out that "a = b + c" really is adding two 32 bit integers that won't overflow. That's true, but that creates a low of work to do at run time. Which is a round about way of saying the same thing as the talk: electing to do it at run time means the language chose flexibility over speed.

You can't always fix this in an interpreter. Javascript has some of the best interpreters around, and they do make the happy path run quickly. But those interpreters come with caveats, usually of the form "if you muck around with the internals of classes, by say replacing function definitions at run time, we abandon all attempts to JIT it". People don't typically do such things in Javascript, but as it happens, Python's design with it's meta classes, dynamic types created with "type(...)", and "__new__(..)" almost could be said encourage that coding style. That is, again, a language design choice, and it's one that favours flexibility over speed.

dragonwriter•2h ago

> They are pure overhead, for no value in this situation. Iow, if Python had a sufficiently intelligent compiler/JIT, these things could be optimized away (in this use case, but certainly not all).

Hence, Numba.

teo_zero•22h ago

I don't know Python so well as to propose any meaningful contribution, but it seems to me that most issues would be mitigated by a sort of "final" statement or qualifier, that prohibits any further changes to the underlying data structure, thus enabling all the nice optimizations, tricks and shortcuts that compilers and interpreters can't afford when data is allowed to change shape under their feet.

Fraterkes•20h ago

I assume people dislike those kinds of solutions because the extreme dynamism is used pretty rarely in a lot of meat and potatoes python scripst. So a lot of “regular” python scripts would have to just plaster “final” everywhere to make it as fast as it can be.

At that point youd maybe want to have some sort of broader way to signify which parts of your script are dynamic. But then, youd have a language that can be dynamic even in how dynamic it is…

game_the0ry•21h ago

I know I am going to get some hate for this from the "Python-stans" but..."python" and "performance" should never be associated with each other, and same for any scripting/interpreted programming language. Especially if it has a global interpreter lock.

While performance (however you may mean that) is always a worthy goal, you may need to question your choice of language if you start hitting performance ceilings.

As the saying goes - "Use the right tool for the job." Use case should dictate tech choices, with few exceptions.

Ok, now that I have said my piece, now you can down vote me :)

danielrico•21h ago

That's used by some people as excuse to write the most inefficient code.

Ok, you are not competing with c++, but also you shouldn't be redoing all the calculations because you haven't figured the data access pattern..

ahoka•21h ago

Have you read the fine article?

throwaway6041•21h ago

> the "Python-stans"

I think the term "Pythonistas" is more widely used

> you may need to question your choice of language if you start hitting performance ceilings.

Developers should also question if a "fast" language like Rust is really needed, if implementing a feature takes longer than it would in Python.

I don't like bloat in general, but sometimes it can be worth spinning up a few extra instances to get to market faster. If Python lets you implement a feature a month earlier, the new sales may even cover the additional infrastructure costs.

Once you reach a certain scale you may need to rewrite parts of your system anyway, because the assumptions you made are often wrong.

game_the0ry•17h ago

> Developers should also question if a "fast" language like Rust is really needed...

Agreed.

wiseowise•21h ago

Do you get off from bashing on languages or what?

ActorNightly•18h ago

>"Use the right tool for the job."

Python + C covers pretty much anything you really ever need to build, unless you are doing something with game engines that require the use of C++/C#. Rust is even more niche.

crabbone•21h ago

Again and again, the most important question is "why?" not "how?". Python isn't made to be fast. If you wanted a language that can go fast, you needed to build it into the language from the start: give developers tools to manage memory layout, give developers tools to manage execution flow, hint the compiler about situations that present potential for optimization, restrict dispatch and polymorphism, restrict semantics to fewer interpretations.

Python has none of that. It's a hyper-bloated language with extremely poor design choices all around. Many ways of doing the same thing, many ways of doing stupid things, no way of communicating programmer's intention to the compiler... So why even bother? Why not use a language that's designed by a sensible designer for this specific purpose?

The news about performance improvements in Python just sound to me like spending useful resources on useless goals. We aren't going forward by making Python slightly faster and slightly more bloated, we just make this bad language even harder to get rid of.

Danmctree•19h ago

The frustrating thing is that the math and AI support in the python ecosystem is arguably the best. These happen to also be topics where performance is critical and where you want things to be tight.

c++ has great support too but often isn't usable in communities involving researchers and juniors because it's too hard for them. Startup costs are also much higher.

Ans so you're often stuck with python.

We desperately need good math/AI support in faster languages than python but which are easier than c++. c#? Java?

pjmlp•1h ago

It is kind of ironic that this is now the Zeitgeist, while in the 1990's my university used to teach C++ to first year students, and I learned it as high school student with Turbo C++ 1.0 for MS-DOS, about a year after it was made commercially available, later acquiring Turbo C++ 1.5 for Windows 3.1 with student discount.

adsharma•21h ago

The most interesting part of this article is the link to SPy. Attempts to find a subset of python that could be made performant.

ajross•20h ago

Honestly that seems Sisyphean to me. The market doesn't want a "performant subset". The market is very well served by performant languages. The market wants Python's expressivity. The market wants duck typing and runtime-inspectable type hierarchies and mutable syntax and decorators. It loves it. It's why Python is successful.

My feeling is that numba has exactly the right tactic here. Don't try to subset python from on high, give developers the tools[1] so that they can limit themselves to the fast subset, for the code they actually want. And let them make the call.

(The one thing numba completely fails on though is that it insists on using its own 150+MB build of LLVM, so it's not nearly as cleanly deployable as you'd hope. Come on folks, if you use the system libc you should be prepared to use the system toolchain.)

[1] Simple ones, even. I mean, to first approximation you just put "@jit" on the stuff you want fast and make sure it sticks to a single numeric type and numpy arrays instead of python data structures, and you're done.

zozbot234•20h ago

> The market wants duck typing and runtime-inspectable type hierarchies and mutable syntax and decorators. It loves it.

These features have one thing in common: they're only useful for prototype-quality throwaway code, if at all. Once your needs shift to an increased focus on production use and maintainability, they become serious warts. It's not just about performance (though it's obviously a factor too), there's real reasons why most languages don't do this.

ajross•19h ago

> These features have one thing in common: they're only useful for prototype-quality throwaway code, if at all.

As a matter of practice: the python community disagrees strongly. And the python community ate the world.

It's fine to have an opinion, but you're not going to change python.

Philpax•18h ago

The existence of several type-checkers and Astral's largely-successful efforts to build tooling that pulls Python out of its muck seems to suggest otherwise.

Better things are possible, and I'm hoping that higher average quality of Python code is one of those things.

adsharma•18h ago

That assumes python is one monolithic thing and everyone agrees what it is.

True, the view you express here has strong support in the community and possibly in the steering committee.

But there are differing ideas on what python is and why it's successful.

ajross•17h ago

> That assumes python is one monolithic thing and everyone agrees what it is.

It's exactly the opposite! I'm saying that python is BIG AND DIVERSE and that attempts like SPy to invent a new (monolithic!) subset language that everyone should use instead are doomed, because it won't meet the needs of all the yahoos out there doing weird stuff the SPy authors didn't think was important.

It's fine to have "differing ideas on what python is", but if those ideas don't match those of all of the community, and not just what you think are the good parts, it's not really about what "python" is, is it?

adsharma•18h ago

My cursory reading is that SPy is generous in what it accepts.

The subset I've been working with is even narrower. Given my stance on pattern matching, it may not even be a subset.

https://github.com/py2many/py2many/blob/main/doc/langspec.md

pjmlp•1h ago

The market can have that in plenty of dynamic languages with JIT compilers, including Python, if there was more PyPy love from the community.

pabe•20h ago

The SPy demo is really good in showing the the difference in performance between Python and their derivative. Well done!

hansvm•20h ago

In the "dynamic" section, it's much worse than the author outlines. You can't even assume that the constant named "10" will point to a value which behaves like you expect the number 10 to behave.

zahlman•18h ago

I guess you mean "N". 10 is a literal, not a name. The part "N cannot be assumed to be ten, because that could be changed elsewhere in the code" implies well enough that the change could be to a non-integer value. (For that matter, writing `N: int = 10` does nothing to fix that.)

hansvm•18h ago

No, I mean the literal. CPython is more flexible than it has any right to be, and you're free to edit the memory pointed to by the literal 10.

zahlman•17h ago

Care to show how you believe this can be achieved, from within Python?

hansvm•17h ago

  import ctypes

  ten = 10
  addr = id(ten)
  
  class PyLongObject(ctypes.Structure):
      _fields_ = [
          ("ob_refcnt", ctypes.c_ssize_t),
          ("ob_type", ctypes.c_void_p),
          ("ob_size", ctypes.c_ssize_t),
          ("ob_digit", ctypes.c_uint32 * 1),
      ]
  long_obj = PyLongObject.from_address(addr)
  
  long_obj.ob_digit[0] = 3
  assert 10 == 3
  
  # using an auxiliary variable to prevent any inlining
  # done at the interpreter level before actually querying
  # the value of the literal `10`
  x = 3
  assert 10 * x == 9
  assert 10 + x == 6

zahlman•17h ago

Okay, but this is going out of one's way to view the runtime itself as a C program and connect to it with the FFI. For that matter, the notion that the result of `id` (https://docs.python.org/3/library/functions.html#id) could sensibly be passed to `from_address` is an implementation detail. This is one reason the language suffers from not having a formal specification: it's unclear exactly how much of this madness alternative implementations like PyPy are expected to validate against. But I think people would agree that poking at the runtime's own memory cannot be expected to give deterministic results, and thus the implementation should in fact consider itself free to assume that isn't happening. (After all, we could take that further; e.g. what if we had another process do the dirty work?)

hansvm•16h ago

Except, that sort of thing is important in places like gevent, pytest, and numba, and that functionality isn't easy to replace without a lot of additional language/stdlib work (no sane developer would reach for it if other APIs sufficed).

The absurd example of overwriting the literal `10` is "obviously" bad, but your assertion that the interpreter should be able to assume nobody is overwriting its memory isn't borne out in practice.

zahlman•16h ago

> Except, that sort of thing is important in places like gevent, pytest, and numba

What, mutating the data representation of built-in types documented to be immutable? For what purpose?

pu_pe•20h ago

Python and other high-level languages may actually decrease in popularity with better LLMs. If you are not the one programming it, might as well do it in a more performant language from the start.

richard_todd•19h ago

In my workflows I already tend to tell LLMs to write scripts in Go instead of python. The LLM doesn't care about the increased tediousness and verbosity that would drive me to Python, and the result will be much faster.

Philpax•18h ago

I saw a short post to this effect here: https://solmaz.io/typed-languages-are-better-suited-for-vibe...

pjmlp•1h ago

LLMs will make most languages irrelevant indeed, just like most developers have no idea about the Assembly/machine code that the JIT/AOT compilers of their favourite programming language happen to generate.

Eventually LLMs might even generate executables directly.

fumeux_fume•19h ago

Slow or fast ultimately matter in the context for which you need to use it. Perhaps these are only myths and fairly tales for an incredibly small subset of people who value execution speed as the highest priority, but choose to use Python for implementation.

Mithriil•19h ago

> His "sad truth" conclusion is that "Python cannot be super-fast" without breaking compatibility.

A decent case of Python 4.0?

> So, maybe, "a JIT compiler can solve all of your problems"; they can go a long way toward making Python, or any dynamic language, faster, Cuni said. But that leads to "a more subtle problem". He put up a slide with a trilemma triangle: a dynamic language, speed, or a simple implementation. You can have two of those, but not all three.

This trilemma keeps getting me back towards Julia. It's less simple than Python, but much faster (mitigated by pre-compilation time), and almost as dynamic. I'm glad this language didn't die.

Alex3917•18h ago

> A decent case of Python 4.0?

I definitely agree with this eventually, but for now why not just let developers set `dynamic=False` on objects and make it opt in? This is how Google handles breaking Angular upgrades, and in practice it works great because people have multiple years to prepare for any breaking changes.

zahlman•18h ago

> A decent case of Python 4.0?

I think "Python 4.0" is going to have to be effectively a new language by a different team that simply happens to bear strong syntactic similarities. (And at least part of why that isn't already happening is that everyone keeps getting scared off by the scale of the task.)

Thanks for the reminder that I never got around to checking out Julia.

olejorgenb•17h ago

Isn't that kinda what Mojo is?

zahlman•17h ago

I haven't tried it, but that matches my understanding, yeah.

Personally I'd be more interested in designing from scratch.

wraptile•6h ago

closed sourced proprietary language will never be able to succeed Python here.

rirze•17h ago

If Julia fixes it package manager problems (does it still take a while to load imports?), I think it could become popular.

casparvitch•4h ago

I think you're referring to the TTFP (time to first plot) issue (the package manager is top notch). TTFP has been drastically improved with a bunch of optimisations, and then you can pre-compile your project to keep it fast e.g. between running your script with different params.

rybosome•16h ago

Yeah, this is a case of "horses for courses", as you suggest.

I love Python. It's amazing with uv; I just implemented a simple CLI this morning for analyzing data with inline dependencies that's absolutely perfect for what I need and is extremely easy to write, run, and tweak.

Based on previous experience, I would not suggest Python should be used for an API server where performance - latency, throughput - and scalability of requests is a concern. There's lots of other great tools for that. And if you need to write an API server and it's ok not to have super high performance, then yeah Python is great for that, too.

But it's great for what it is. If they do make a Python 4.0 with some breaking changes, I hope they keep the highly interpreted nature such that something like Pydantic continues to work.

taeric•19h ago

Is amusing to see the top comment on the site be about how Common LISP approached this. And hard not to agree with it.

I don't understand how we had super dynamic systems decades ago that were easier to optimize than people care to understand. Heaven help folks if they ever get a chance to use Mathematica.

tuna74•18h ago

In computing terms, saying something is "slow" is kind of pointless. Saying something is "effective" or "low latency" provides much more information.

Redoubts•17h ago

Wonder if mojo has gotten anywhere further, since they’re trying to bring speed while not sacrificing most of the syntax

https://docs.modular.com/mojo/why-mojo/#a-member-of-the-pyth...

actinium226•17h ago

A lot of the examples he gives, like the numpy/calc function, are easily converted to C/C++/Rust. The article sort of dismisses this at the start, and that's fine if we want to focus on the speed of Python itself, but it seems like both the only solution and the obvious solution to many of the problems specified.

lkirk•16h ago

For me, in my use of Python as a data analysis language, it's not python's speed that is an annoyance or pain point, it's the concurrency story. Julia's built in concurrency primatives are much more ergonomic in my opinion.

1vuio0pswjnm7•15h ago

"He started by asking the audience to raise their hands if they thought "Python is slow or not fast enough";"

Wrong question

Maybe something like, "Python startup time is as fast as other interpreters"

Comparatively, Python (startup time) is slow(er)

ehsantn•14h ago

The article highlights important challenges regarding Python performance optimization, particularly due to its highly dynamic nature. However, a practical solution involves viewing Python fundamentally as a Domain Specific Language (DSL) framework, rather than purely as a general-purpose interpreted language. DSLs can effectively be compiled into highly efficient machine code.

Examples such as Numba JIT for numerical computation, Bodo JIT/dataframes for data processing, and PyTorch for deep learning demonstrate this clearly. Python’s flexible syntax enables creating complex objects and their operators such as array and dataframe operations, which these compilers efficiently transform into code approaching C++-level performance. DSL operator implementations can also leverage lower-level languages such as C++ or Rust when necessary. Another important aspect not addressed in the article is parallelism, which DSL compilers typically handle quite effectively.

Given that data science and AI are major use cases for Python, compilers like Numba, Bodo, and PyTorch illustrate how many performance-critical scenarios can already be effectively addressed. Investing further in DSL compilers presents a practical pathway to enhancing Python’s performance and scalability across numerous domains, without compromising developer usability and productivity.

Disclaimer: I have previously worked on Numba and Bodo JIT.

echoangle•14h ago

Was this comment written by an LLM?

coldtea•11h ago

One big Python performance myth is the promise made several years ago that Python will get 5x faster in the next 5 years. So far the related changes have brought not even 2x gains.

johnisgood•59m ago

> Python is fast enough for some tasks, he said, which is why there are so many people using it and attending conferences like EuroPython.

The conclusion is logically flawed: it conflates language popularity with performance, and conference attendance and widespread use are sociological indicators, not evidence of Python's performance. Conflating the two is intellectually negligent.

Additionally, Python's speed is largely due to C extensions handling performance-critical tasks, not the interpreter itself. Perl, however, is often faster even in pure code, especially for text processing and regex, thanks to its optimized engine, making it inherently quicker in many common scenarios.

"I closed MPEG on 2 Jun '20 when I left because obscure forces had hijacked it."

New AI Coding Teammate: Gemini CLI GitHub Actions

We replaced passwords with something worse

About AI

Cracking the Vault: How we found zero-day flaws in HashiCorp Vault

Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs

Claude Code IDE integration for Emacs

Debounce

Project Hyperion: Interstellar ship design competition

Rules by which a great empire may be reduced to a small one (1773)

A candidate giant planet imaged in the habitable zone of α Cen A

Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model

Children's movie leads art historian to long-lost Hungarian masterpiece (2014)

Litestar is worth a look

Jules, our asynchronous coding agent

Herbie detects inaccurate expressions and finds more accurate replacements

Writing a Rust GPU kernel driver: a brief introduction on how GPU drivers work

Did Craigslist decimate newspapers? Legend meets reality

We'd be better off with 9-bit bytes

Gaybreaking

A fast, growable array with stable pointers in C

The Bluesky Dictionary

40 Years of the Amiga

What is the average length of a queue of cars? (2023)

Scientists have recreated the Universe's first molecule

Automerge 3.0

Mac history echoes in current Mac operating systems

Multics

Comptime.ts: compile-time expressions for TypeScript

Breaking the sorting barrier for directed single-source shortest paths

"I closed MPEG on 2 Jun '20 when I left because obscure forces had hijacked it."

New AI Coding Teammate: Gemini CLI GitHub Actions

We replaced passwords with something worse

About AI

Cracking the Vault: How we found zero-day flaws in HashiCorp Vault

Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs

Claude Code IDE integration for Emacs

Debounce

Project Hyperion: Interstellar ship design competition

Rules by which a great empire may be reduced to a small one (1773)

A candidate giant planet imaged in the habitable zone of α Cen A

Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model

Children's movie leads art historian to long-lost Hungarian masterpiece (2014)

Litestar is worth a look

Jules, our asynchronous coding agent

Herbie detects inaccurate expressions and finds more accurate replacements

Writing a Rust GPU kernel driver: a brief introduction on how GPU drivers work

Did Craigslist decimate newspapers? Legend meets reality

We'd be better off with 9-bit bytes

Gaybreaking

A fast, growable array with stable pointers in C

The Bluesky Dictionary

40 Years of the Amiga

What is the average length of a queue of cars? (2023)

Scientists have recreated the Universe's first molecule

Automerge 3.0

Mac history echoes in current Mac operating systems

Multics

Comptime.ts: compile-time expressions for TypeScript

Breaking the sorting barrier for directed single-source shortest paths

Python performance myths and fairy tales

Comments