But yeah.
Why do people think it's a good trade-off?
Personally I think it is more crazy that you would optimize 99% of the time just to need it for 1% of the time.
The amount of complexity you can code up in a short time, that most everyone can contribute to, is incredible.
Usually they also call Python to libraries that are 95% C code.
https://github.com/OpenMathLib/OpenBLAS https://github.com/FFmpeg/FFmpeg
Plenty of assembly in those projects but no mention of it in the README. Most C projects don't acknowledge the assembly they use.
between those two, most often performance is just fine to trade off.
Also you don't need code to be fast a lot of the time. If you just need some number crunching that is occasionally run by a human, taking a whole second is fine. Pretty good replacement for shell scripting too.
Folks on HN are so weird when it comes to why these languages exist and why people keep writing in them. For all their faults and dynamism and GC and lack of static typing in the real world with real devs you get code that is more correct written faster when you use a higher level language. It's Go's raison d'etre.
The more interesting question is why the tradeoff was made in the first place.
The answer is, it's relatively easy for us to see and understand the impact of these design decisions because we've been able to see their outcomes over the last 20+ years of Python. Hindsight is 20/20.
Remember that Python was released in 1991, before even Java. What we knew about programming back then vs what we know now is very different.
Oh and also, these tradeoffs are very hard to make in general. A design decision that you may think is irrelevant at the time may in fact end up being crucial to performance later on, but by that point the design is set in stone due to backwards compatibility.
Even if you have to stick to CPython, Numba, Pythran etc, can give you amazing performance for minimal code changes.
The examples in the article seem gloomy: how could a JIT possibly do all the checks to make sure the arguments aren’t funky before adding them together, in a way that’s meaningfully better than just running the interpreter? But in practice, a JIT can create code that does these checks, and modern processors will branch-predict the happy path and effectively run it in parallel with the checks.
JavaScript, too, has complex prototype chains and common use of boxed objects - but v8 has made common use cases extremely fast. I’m excited for the future of Python.
Most of the time it doesn't matter, most high-throughput python code just invokes C/C++ where these concerns are not as big of a problem. Most JS code just invokes C/C++ browser DOM objects. As long as the hot-path is not in those languages you are not at such high risk of "innocent change tanked performance"
Even server-side most JS/Python/Ruby code is just simple HTTP stack handlers and invoking databases and shuffling data around. And often large part of the process of handling a request (encoding JSON/XML/etc, parsing HTTP messages, etc) can be written in lower-level languages.
But we don't measure programming language performance in absolute terms. We measure them in relative terms, generally against C. And while your Python code is speculating about how this Python object will be unboxed, where its methods are, how to unbox its parameters, what methods will be called on those, etc., compiled code is speculating on actual code the programmer has written, running that in parallel, such that by the time the Python interpreter is done speculating successfully on how some method call will resolve with actual objects the compiled code language is now done with ~50 lines of code of similar grammatical complexity. (Which is a sloppy term, since this is a bit of a sloppy conversation, but consider a series "p.x = y"-level statements in Python versus C as the case I'm looking at here.)
There's no way around it. You can spend your amazingly capable speculative parallel CPU on churning through Python interpretation or you can spend it on doing real work, but you can't do both.
After all, the interpreter is just C code too. It's not like it gets access to special speculation opcodes that no other program does.
Anyway, I think you’re totally right, in your general message. Python will never be the fastest language in all contexts. Still, there is a lot of room for optimization, and given it’s a popular language, it’s worth the effort.
Just because I don’t write the bounds-checking and type-checking and dynamic-dispatch and error-handling code myself, doesn’t make it any less a conscious decision I made by choosing Python. It’s all “real work.”
Of course, the bank account is only a means to the end of paying the dentist for installing crowns on your teeth and whatnot, and the sound effect is only a means to the end of making your music sound less like Daft Punk or something, so it's kind of fuzzy. It depends on what people are thinking about achieving. As programmers, because we know the experience of late nights debugging when our array bounds overflow, we think of bounds checking and type checking as ends in themselves.
But only up to a point! Often, type checking and bounds checking can be done at compile time, which is more efficient. When we do that, as long as it works correctly, we never† feel disappointed that our program isn't doing run-time type checks. We never look at our running programs and say, "This program would be better if it did more of its type checks at runtime!"
No. Run-time type checking is purely a deadweight loss: wasting some of the CPU on computation that doesn't move the program toward achieving the goals we were trying to achieve when we wrote it. It may be a worthwhile tradeoff (for simplicity of implementation, for example) but we must weigh it on the debit side of the ledger, not the credit side.
______
† Well, unless we're trying to debug a PyPy type-specialization bug or something. Then we might work hard to construct a program that forces PyPy to do more type-checking at runtime, and type checking does become an end.
What do you mean. Daft Punk is not daft punk. Why single them out :)
This is one of the issues with Python I've pointed out before, to the point I suggest that someone could make a language around this idea: https://jerf.org/iri/post/2025/programming_language_ideas/#s... In Python you pay and pay and pay and pay and pay for all this dynamic functionality, but in practice you aren't actually dynamically modifying class hierarchies and attaching arbitrary attributes to arbitrary instances with arbitrary types. You pay for the feature but you benefit from them far less often than the number of times Python is paying for them. Python spends rather a lot of time spinning its wheels double-checking that it's still safe to do the thing it thinks it can do, and it's hard to remove that even in JIT because it is extremely difficult to prove it can eliminate those checks.
Because other languages can do that for you too, much much faster...
What interpreter? We’re talking about JITting Python to native code.
At this point, you just shouldn't be making that promise. Decent chance that promise is already older than you are. Just let the performance be what it is, and if you need better performance today, be aware that there are a wide variety of languages of all shapes and sizes standing by to give you ~25-50x better single threaded performance and even more on multi-core performance today if you need it. If you need it, waiting for Python to provide it is not a sensible bet.
Certainly my version would be even faster if I implemented it in C, but the gains of going from exponential to linear completely dominate the language difference.
Have you ever heard of a controlled variable?
This is wrong i think? The GP is talking about JIT'd code.
With Python that does not work. There are simply more optimization-unfriendly constructs and popular libraries use those. And Python calls arbitrary C libraries with fixed ABI.
So optimizing Python is inherently more difficult.
Isn't v8 still entirely single threaded with limited message passing? Python just went through a lot of work to make multithreaded code faster, it would be disappointing if it had to scrap threading entirely and fall back to multiprocessing on shared memory in order to match v8.
Given the current state of computing, I am unable to state definitively if this suggestion is satire.
Yes, that is literally the explicit point of the talk. The first myth of the article was “python is not slow“
The first myth is "Python is not slow" - it is debunked, it is slow.
The second myth is ""it's just a glue language / you just need to rewrite the hot parts in C/C++" - it is debunked, just rewriting stuff in C/Rust does not help.
The third myth is " Python is slow because it is interpreted" - it is debunked, it is not slow only because it is interpreted.
For that matter, I recently saw a talk in the Python world that was about convincing people to let their computer do more work locally in general, because computers really are just that fast now.
Except it does. The key is to figure out which part you actually need to go fast, and write it in C. If most of your use case is dominated by network latency.
Overall, people seem to miss the point of Python. The best way to develop software is "make it work, make it good, make it fast" - the first part gets you to an end to end prototype that gives you a testable environment, the second part establishes the robustness and consistency, and the third part lets you focus on optimizing the performance with a robust framework that lets you ensure that your changes are not breaking anything.
Pythons focus is on the first part. The idea is that you spend less time making it work. Once you have it working, then its much easier to do the second part (adding tests, type checking, whatever else), and then the third part. Now with LLMs, its actually pretty straightforward to take a python file and translate it to .c/.h files, especially with agents that do additional "thinking" loops.
However, even given all of that, in practice you often don't need to move away from Python. For example, I have a project that datamines Strava Heatmaps (i.e I download png tiles for entire US). The amount of time that it took me to write it in Python in addition to running it (which takes about a day) is much shorter than it would have taken me to write it in C++/Rust and then run it with speedup in processing.
Like, I wouldn't say it's a "myth" that Linux is easy to use.
This is strange. Most people in programming community know python is slow. If it has any reputation, it's that it is quite slow
Ironically Fortran support is one of the reasons CUDA won over OpenCL.
Having said that, plenty of programming languages with JIT/AOT toolchains have nice YAML parsers, I don't see the need to bother with Python for that.
I have also been frustrated while trying to interoperate with expensive proprietary software because documentation was lacking, and the source code was unavailable.
In one instance, a proprietary software had the source code "exposed", which helped me work around its bugs and use it properly (also poorly documented).
There are of course other advantages of having that transparancy, like being able to independently audit the code for vulnerabilities or unacceptable "features", and fix those.
Open source is oftentimes a prerequisite for us to be able to control our software.
But honestly the thing that makes any of my programs slow is network calls. And there a nice async setup goes a long way. And then k8 for the scaling.
But yes managing db connections is a pain. But I don’t think it’s any better in Java (my only other reference at this scale)
A bunch of SREs discussing which languages/servers/runtimes are fast/slow/efficient in comparable production setups would give more practical guidance.
If you're building an http daemon in a traditional three-tiered app (like a large % of people on HN), IME, Python has quietly become a great language in that space, compared to its peers, over the last 8 years.
https://pythonspeed.com/articles/python-extension-performanc...
You can avoid that problem to some extent by implementing your own data container as part of your C extension (the article's solution #1); frobbing that from a Python loop can still be significantly faster than allocating and deallocating boxed integers all the time, with dynamic dispatch and reference counting. But, yes, to really get reasonable performance you want to not be running bytecodes in the Python interpreter loop at all (the article's solution #2).
But that's not because of serialization or other kinds of data format translation.
For 99.99% of the programs that people write, the modern M.2 NVME hard drives are plenty fast, and thats the laziest way to load data into a C extension or process.
Then there is unix pipes which are sufficiently fast.
Then there is shared memory, which basically involves no loading.
As with Python, all depends on the setup.
Typically Python is just the entry and exit point (with a little bit of massaging), right?
And then the overwhelming majority of the business logic is done in Rust/C++/Fortran, no?
That is probably why his demo was Sobel edge detection with Numpy. Sobel can run fast enough at standard resolution on a CPU, but once that huge buffer needs to be read or written outside of your fast language, things will get tricky.
This also comes up in Tauri, since you have to bridge between Rust and JS. I'm not sure if Electron apps have the same problem or not.
if you want multiprocessing, use the multiprocessing library, scatter and gather type computation, etc
In my opinion in most cases where you might want to write a project in two languages with FFI, it's usually better not to and just use one language even if that language isn't optimal. In this case, just write the whole thing in C++ (or Rust).
There are some exceptions but generally FFI is a huge cost and Python doesn't bring enough to the table to justify its use if you are already using C++.
As for the language with similar syntax, do you want Nim, Mojo or Scala 3?
Java has similar levels of dynamism-with invokedynamic especially, but already with dynamic dispatch-in practice the JIT monomorphises to a single class even though by default classes default to non-final in Java and there may even be multiple implementations known to the JVM when it monomorphises. Such is the strength of the knowledge that a JIT has compared to a local compiler.
That aside, I was expecting some level of a pedantic argument, and wasn't disappointed by this one:
"A compiler for C/C++/Rust could turn that kind of expression into three operations: load the value of x, multiply it by two, and then store the result. In Python, however, there is a long list of operations that have to be performed, starting with finding the type of p, calling its __getattribute__() method, through unboxing p.x and 2, to finally boxing the result, which requires memory allocation. None of that is dependent on whether Python is interpreted or not, those steps are required based on the language semantics."
The problem with this argument is the user isn't trying to do these things, they are trying to do multiplication, so the fact that the lang. has to do all things things in the end DOES mean it is slow. Why? Because if these things weren't done, the end result could still be achieved. They are pure overhead, for no value in this situation. Iow, if Python had a sufficiently intelligent compiler/JIT, these things could be optimized away (in this use case, but certainly not all). The argument is akin to: "Python isn't slow, it is just doing a lot of work". That might be true, but you can't leave it there. You have to ask if this work has value, and in this case, it does not.
By the same argument, someone could say that any interpreted language that is highly optimized is "fast" because the interpreter itself is optimized. But again, this is the wrong way to think about this. You always have to start by asking "What is the user trying to do? And (in comparison to what is considered a fast language) is it fast to compute?". If the answer is "no", then the language isn't fast, even if it meets the expected objectives. Playing games with things like this is why users get confused on "fast" vs "slow" languages. Slow isn't inherently "bad", but call a spade a spade. In this case, I would say the proper way to talk about this is to say: "It has a fast interpreter". The last word tells any developer with sufficient experience what they need to know (since they understand statically compiled/JIT and interpreted languages are in different speed classes and shouldn't be directly compared for execution speed).
> Another "myth" is that Python is slow because it is interpreted; again, there is some truth to that, but interpretation is only a small part of what makes Python slow.
He concedes its slow, he's just saying it's not related to how interpreted it is.
Typically from a user perspective, the initial starting time is either manageable or imperceptible in the cases of long running services, although there are other costs.
If you look at examples that make the above claim, they are almost always tiny toy programs where the cost of producing byte/machine code isn't easily amortized.
This quote from the post is an oversimplification too:
> But the program will then run into Amdahl's law, which says that the improvement for optimizing one part of the code is limited by the time spent in the now-optimized code
I am a huge fan of Amdahl's law, but also realize it is pessimistic and most realistic with parallelization.
It runs into serious issues when you are multiprocessing vs parallel processing due to preemption, etc .
Yes you still have the costs of abstractions etc...but in today's world, zero pages on AMD, 16k pages and a large number of mapped registers on arm, barrel shifters etc... make that much more complicated especially with C being forced into trampolines etc...
If you actually trace the CPU operations, the actual operations for 'math' are very similar.
That said modern compilers are a true wonder.
Interpreted language are often all that is necessary and sufficient. Especially when you have Internet, database and other aspects of the system that also restrict the benefits of the speedups due to...Amdahl's law.
In summary, it depends. I am talking about compute performance, not I/O or general purpose task benchmarking. Yes, if you have a mix of compute and I/O (which admittedly is a typical use case), it isn't going to be 20-100x slower, but more likely "only" 3-20x slower. If it is nearly 100% I/O bound, it might not be any slower at all (or even faster if properly buffered). If you are doing number crunching (w/o a C lib like NumPy), your program will likely be 40-100x slower than doing it in C, and many of these aren't toy programs.
Python isn't evaluated line-by-line, even in micropython, which is about the only common implementation that doesn't work in the same way.
Cython VM will produce an AST of opcodes, and binary operations just end up popping off a stack, or you can hit like pypy.
How efficiently you can keep the pipeline fed is more critical than computation costs.
int a = 5;
int b = 10;
int sum = a + b;
Is compiled to: MOV EAX, 5
MOV EBX, 10
ADD EAX, EBX
MOV [sum_variable]
In the PVM binary operations remove the top of the stack (TOS) and the second top-most stack item (TOS1) from the stack. They perform the operation, and put the result back on the stack.That pop, pop isn't much more expensive on modern CPUs and some C compilers will use a stack depending on many factors. And even in C you have to use structs of arrays etc... depending on the use case. Stalled pipelines and fetching due to the costs is the huge difference.
It is the setup costs, GC, GIL etc... that makes python slower in many cases.
While I am not suggesting it is as slow as python, Java is also byte code, and often it's assumptions and design decisions are even better or at least nearly equal to C in the general case unless you highly optimize.
But the actual equivalent computations are almost identical, optimizations that the compilers make differ.
> A compiler for C/C++/Rust could turn that kind of expression into three operations: load the value of x, multiply it by two, and then store the result. In Python, however, there is a long list of operations that have to be performed, starting with finding the type of p, calling its __getattribute__() method, through unboxing p.x and 2, to finally boxing the result, which requires memory allocation. None of that is dependent on whether Python is interpreted or not, those steps are required based on the language semantics.
i.e.
if(a->type != int_type || b->type != int_type) abort_to_interpreter();
result = ((intval*)a)->val + ((intval*)b)->val;
The CPU does have to execute both lines, but it does them in parallel so it's not as bad as you'd expect. Unless you abort to the interpreter, of course.
In Python, p.x * 2 means dynamic lookup, possible descriptors, big-int overflow checks, etc. A compiler can drop that only if it proves they don’t matter or speculates and adds guards—which is still overhead. That’s why Python is slower on scalar hot loops: not because it’s interpreted, but because its dynamic contract must be honored.
Somehow Smalltalk JIT compilers handle it without major issues.
You get real speed in Python by narrowing the semantics (e.g. via NumPy, Numba, or Cython) not by hoping the compiler outsmarts the language.
There is to say everything dynamic that can be used as Python excuse, Smalltalk and Self, have it, and double up.
Second, at most this describes WHY it is slow, not that it isn't, which is my point. Python is slow. Very slow (esp. for computation heavy workloads). And that is okay, because it does what it needs to do.
I'd argue differently. I'd say the the problem isn't that the user is doing those things, it's that the language doesn't know what he's trying to do.
Python's explicit goal was always ergonomics, and it was always ergonomics over speed or annoying compile time error messages. "Just run the code as written dammit" was always the goal. I remember when the never class model was introduced, necessitating the introduction of __get_attribute__. My first reaction as a C programmer was "gee you took a speed hit there". A later reaction was to use it to twist the new system into something it's inventors possibly never thought of. It was a LR(1) parser, that let you write the grammars as regular Python statements.
While they may not have thought abusing the language in that particular way, I'm sure the explicit goal was to create a framework that any idea to be expressed with minimal code. Others also used to hooks they provided into the way the language builds to create things like pydantic and spyne. Spyne for example lets you express the on-the-wire serialisation formats used by RPC as Python class declarations, and then compile them into JSON, xml, SOAP of whatever. Sqlalchamey lets you express SQL using Python syntax, although in a more straightforward way.
All of them are very clever in how they twist the language. Inside those frameworks, "a = b + c" does not mean "add b to c, and place the result in a". In the LR(1) parser for example it means "there is a production called 'a', that is a 'b' followed by a 'c'". 'a' in that formulation holds references to 'b' and 'c'. Later the LR(1) parser will consume that, compiling it into something very different. The result is a long way from two's compliment addition.
It is possible to use a power powerful type systems in a similar way. For example I've seen FPGA designs expressed in Scalar. However, because Scalar's type system insists on knowing what is going on at compile time, Scalar had a fair idea of what the programmer is building. The compile result isn't going to be much slower than any other code. Python achieved the same flexibility by abandoning type checking at compile time almost entirely, pushing it all to run time. Thus the compiler has no idea of what going to executed in the end (the + operation in the LR parser only gets executed once for example), which is what I said above "it's that the language doesn't know what the programmer is trying to do".
You argue that since it's an interpreted language, it's the interpreters jobs to figure out what the programmer is trying to do at run time. Surely it can figure out that "a = b + c" really is adding two 32 bit integers that won't overflow. That's true, but that creates a low of work to do at run time. Which is a round about way of saying the same thing as the talk: electing to do it at run time means the language chose flexibility over speed.
You can't always fix this in an interpreter. Javascript has some of the best interpreters around, and they do make the happy path run quickly. But those interpreters come with caveats, usually of the form "if you muck around with the internals of classes, by say replacing function definitions at run time, we abandon all attempts to JIT it". People don't typically do such things in Javascript, but as it happens, Python's design with it's meta classes, dynamic types created with "type(...)", and "__new__(..)" almost could be said encourage that coding style. That is, again, a language design choice, and it's one that favours flexibility over speed.
Hence, Numba.
At that point youd maybe want to have some sort of broader way to signify which parts of your script are dynamic. But then, youd have a language that can be dynamic even in how dynamic it is…
While performance (however you may mean that) is always a worthy goal, you may need to question your choice of language if you start hitting performance ceilings.
As the saying goes - "Use the right tool for the job." Use case should dictate tech choices, with few exceptions.
Ok, now that I have said my piece, now you can down vote me :)
Ok, you are not competing with c++, but also you shouldn't be redoing all the calculations because you haven't figured the data access pattern..
I think the term "Pythonistas" is more widely used
> you may need to question your choice of language if you start hitting performance ceilings.
Developers should also question if a "fast" language like Rust is really needed, if implementing a feature takes longer than it would in Python.
I don't like bloat in general, but sometimes it can be worth spinning up a few extra instances to get to market faster. If Python lets you implement a feature a month earlier, the new sales may even cover the additional infrastructure costs.
Once you reach a certain scale you may need to rewrite parts of your system anyway, because the assumptions you made are often wrong.
Agreed.
Python + C covers pretty much anything you really ever need to build, unless you are doing something with game engines that require the use of C++/C#. Rust is even more niche.
Python has none of that. It's a hyper-bloated language with extremely poor design choices all around. Many ways of doing the same thing, many ways of doing stupid things, no way of communicating programmer's intention to the compiler... So why even bother? Why not use a language that's designed by a sensible designer for this specific purpose?
The news about performance improvements in Python just sound to me like spending useful resources on useless goals. We aren't going forward by making Python slightly faster and slightly more bloated, we just make this bad language even harder to get rid of.
c++ has great support too but often isn't usable in communities involving researchers and juniors because it's too hard for them. Startup costs are also much higher.
Ans so you're often stuck with python.
We desperately need good math/AI support in faster languages than python but which are easier than c++. c#? Java?
My feeling is that numba has exactly the right tactic here. Don't try to subset python from on high, give developers the tools[1] so that they can limit themselves to the fast subset, for the code they actually want. And let them make the call.
(The one thing numba completely fails on though is that it insists on using its own 150+MB build of LLVM, so it's not nearly as cleanly deployable as you'd hope. Come on folks, if you use the system libc you should be prepared to use the system toolchain.)
[1] Simple ones, even. I mean, to first approximation you just put "@jit" on the stuff you want fast and make sure it sticks to a single numeric type and numpy arrays instead of python data structures, and you're done.
These features have one thing in common: they're only useful for prototype-quality throwaway code, if at all. Once your needs shift to an increased focus on production use and maintainability, they become serious warts. It's not just about performance (though it's obviously a factor too), there's real reasons why most languages don't do this.
As a matter of practice: the python community disagrees strongly. And the python community ate the world.
It's fine to have an opinion, but you're not going to change python.
Better things are possible, and I'm hoping that higher average quality of Python code is one of those things.
True, the view you express here has strong support in the community and possibly in the steering committee.
But there are differing ideas on what python is and why it's successful.
It's exactly the opposite! I'm saying that python is BIG AND DIVERSE and that attempts like SPy to invent a new (monolithic!) subset language that everyone should use instead are doomed, because it won't meet the needs of all the yahoos out there doing weird stuff the SPy authors didn't think was important.
It's fine to have "differing ideas on what python is", but if those ideas don't match those of all of the community, and not just what you think are the good parts, it's not really about what "python" is, is it?
The subset I've been working with is even narrower. Given my stance on pattern matching, it may not even be a subset.
https://github.com/py2many/py2many/blob/main/doc/langspec.md
import ctypes
ten = 10
addr = id(ten)
class PyLongObject(ctypes.Structure):
_fields_ = [
("ob_refcnt", ctypes.c_ssize_t),
("ob_type", ctypes.c_void_p),
("ob_size", ctypes.c_ssize_t),
("ob_digit", ctypes.c_uint32 * 1),
]
long_obj = PyLongObject.from_address(addr)
long_obj.ob_digit[0] = 3
assert 10 == 3
# using an auxiliary variable to prevent any inlining
# done at the interpreter level before actually querying
# the value of the literal `10`
x = 3
assert 10 * x == 9
assert 10 + x == 6
The absurd example of overwriting the literal `10` is "obviously" bad, but your assertion that the interpreter should be able to assume nobody is overwriting its memory isn't borne out in practice.
What, mutating the data representation of built-in types documented to be immutable? For what purpose?
Eventually LLMs might even generate executables directly.
A decent case of Python 4.0?
> So, maybe, "a JIT compiler can solve all of your problems"; they can go a long way toward making Python, or any dynamic language, faster, Cuni said. But that leads to "a more subtle problem". He put up a slide with a trilemma triangle: a dynamic language, speed, or a simple implementation. You can have two of those, but not all three.
This trilemma keeps getting me back towards Julia. It's less simple than Python, but much faster (mitigated by pre-compilation time), and almost as dynamic. I'm glad this language didn't die.
I definitely agree with this eventually, but for now why not just let developers set `dynamic=False` on objects and make it opt in? This is how Google handles breaking Angular upgrades, and in practice it works great because people have multiple years to prepare for any breaking changes.
I think "Python 4.0" is going to have to be effectively a new language by a different team that simply happens to bear strong syntactic similarities. (And at least part of why that isn't already happening is that everyone keeps getting scared off by the scale of the task.)
Thanks for the reminder that I never got around to checking out Julia.
Personally I'd be more interested in designing from scratch.
I love Python. It's amazing with uv; I just implemented a simple CLI this morning for analyzing data with inline dependencies that's absolutely perfect for what I need and is extremely easy to write, run, and tweak.
Based on previous experience, I would not suggest Python should be used for an API server where performance - latency, throughput - and scalability of requests is a concern. There's lots of other great tools for that. And if you need to write an API server and it's ok not to have super high performance, then yeah Python is great for that, too.
But it's great for what it is. If they do make a Python 4.0 with some breaking changes, I hope they keep the highly interpreted nature such that something like Pydantic continues to work.
I don't understand how we had super dynamic systems decades ago that were easier to optimize than people care to understand. Heaven help folks if they ever get a chance to use Mathematica.
https://docs.modular.com/mojo/why-mojo/#a-member-of-the-pyth...
Wrong question
Maybe something like, "Python startup time is as fast as other interpreters"
Comparatively, Python (startup time) is slow(er)
Examples such as Numba JIT for numerical computation, Bodo JIT/dataframes for data processing, and PyTorch for deep learning demonstrate this clearly. Python’s flexible syntax enables creating complex objects and their operators such as array and dataframe operations, which these compilers efficiently transform into code approaching C++-level performance. DSL operator implementations can also leverage lower-level languages such as C++ or Rust when necessary. Another important aspect not addressed in the article is parallelism, which DSL compilers typically handle quite effectively.
Given that data science and AI are major use cases for Python, compilers like Numba, Bodo, and PyTorch illustrate how many performance-critical scenarios can already be effectively addressed. Investing further in DSL compilers presents a practical pathway to enhancing Python’s performance and scalability across numerous domains, without compromising developer usability and productivity.
Disclaimer: I have previously worked on Numba and Bodo JIT.
The conclusion is logically flawed: it conflates language popularity with performance, and conference attendance and widespread use are sociological indicators, not evidence of Python's performance. Conflating the two is intellectually negligent.
Additionally, Python's speed is largely due to C extensions handling performance-critical tasks, not the interpreter itself. Perl, however, is often faster even in pure code, especially for text processing and regex, thanks to its optimized engine, making it inherently quicker in many common scenarios.
NeutralForest•1d ago
tweakimp•1d ago
pjmlp•22h ago
So lets see what remains from CPython performance efforts.
ngrilly•13h ago