The first year of free-threaded Python

https://labs.quansight.org/blog/free-threaded-one-year-recap

291•rbanffy•1mo ago

Comments

AlexanderDhoore•1mo ago

Am I the only one who sort of fears the day when Python loses the GIL? I don't think Python developers know what they’re asking for. I don't really trust complex multithreaded code in any language. Python, with its dynamic nature, I trust least of all.

DHolzer•1mo ago

I was thinking that too. I am really not a professional developer though.

OFC it would be nice to just write python and everything would be 12x accelerated, but i don't see how there would not be any draw-backs that would interfere with what makes python so approachable.

NortySpock•1mo ago

I hope at least the option remains to enable the GIL, because I don't trust me to write thread-safe code on the first few attempts.

txdv•1mo ago

how does the the language being dynamic negatively affect the complexity of multithreading?

nottorp•1mo ago

Is there so much legacy python multithreaded code anyway?

Considering everyone knew about the GIL, I'm thinking most people just wouldn't bother.

toxik•1mo ago

There is, and what's worse, it assumes a global lock will keep things synchronized.

rowanG077•1mo ago

Does it? The GIL only ensured each interpreter instruction is atomic. But any group of instruction is not protected. This makes it very hard to rely on the GIL for synchronization unless you really know what you are doing.

immibis•1mo ago

AFAIK a group of instructions is only non-protected if one of the instructions does I/O. Explicit I/O - page faults don't count.

kfrane•1mo ago

If I understand that correctly, it would mean that running a function like this on two threads f(1) and f(2) would produce a list of 1 and 2 without interleaving.

  def f(x):
      for _ in range(N):
          l.append(x)

I've tried it out and they start interleaving when N is set to 1000000.

breadwinner•1mo ago

When the language is dynamic there is less rigor. Statically checked code is more likely to be correct. When you add threads to "fast and loose" code things get really bad.

jaoane•1mo ago

Unless your claim is that the same error can happen more times per minute because threading can execute more code in the same timespan, this makes no sense.

breadwinner•1mo ago

Some statically checked languages and tools can catch potential data races at compile time. Example: Rust's ownership and borrowing system enforces thread safety at compile time. Statically typed functional languages like Haskell or OCaml encourage immutability, which reduces shared mutable state — a common source of concurrency bugs. Statically typed code can enforce usage of thread-safe constructs via types (e.g., Sync/Send in Rust or ConcurrentHashMap in Java).

jerf•1mo ago

I have a hypothesis that being dynamic has no particular effect on the complexity of multithreading. I think the apparent effect is a combination of two things: 1. All our dynamic scripting languages in modern use date from the 1990s before this degree of threading was a concern for the languages and 2. It is really hard to retrofit code written for not being threaded to work in a threaded context, and the "deeper" the code in the system the harder it is. Something like CPython is about as "deep" as you can go, so it's really, really hard.

I think if someone set out to write a new dynamic scripting language today, from scratch, that multithreading it would not pose any particular challenge. Beyond that fact that it's naturally a difficult problem, I mean, but nothing special compared to the many other languages that have implemented threading. It's all about all that code from before the threading era that's the problem, not the threading itself. And Python has a loooot of that code.

rocqua•1mo ago

Dynamic(ally typed) languages, by virtue of not requiring strict typing, often lead to more complicated function signatures. Such functions are generally harder to reason about. Because they tend to require inspection of the function to see what is really going on.

Multithreaded code is incredibly hard to reason about. And reasoning about it becomes a lot easier if you have certain guarantees (e.g. this argument / return value always has this type, so I can always do this to it). Code written in dynamic languages will more often lack such guarantees, because of the complicated signatures. This makes it even harder to reason about Multithreaded code, increasing the risk posed by multithreaded code.

miohtama•1mo ago

GIL or no-GIL concerns only people who want to run multicore workloads. If you are not already spending time threading or multiprocessing your code there is practically no change. Most race condition issues which you need to think are there regardless of GIL.

immibis•1mo ago

With the GIL, multithreaded Python gives concurrent I/O without worrying about data structure concurrency (unless you do I/O in the middle of it) - it's a lot like async in this way - data structure manipulation is atomic between "await" expressions (except in the "await" is implicit and you might have written one without realizing in which case you have a bug). Meanwhile you still get to use threads to handle several concurrent I/O operations. I bet a lot of Python code is written this way and will start randomly crashing if the data manipulation becomes non-atomic.

rowanG077•1mo ago

Afaik the only guarantee there is, is that a bytecode instruction is atomic. Built in data structures are mostly safe I think on a per operation level. But combining them is not. I think by default every few millisecond the interpreter checks for other threads to run even if there is no IO or async actions. See `sys.getswitchinterval()`

hamandcheese•1mo ago

This is the nugget of information I was hoping for. So indeed even GIL threaded code today can suffer from concurrency bugs (more so than many people here seem to think).

ynik•1mo ago

Bytecode instructions have never been atomic in Python's past. It was always possible for the GIL to be temporarily released, then reacquired, in the middle of operations implemented in C. This happens because C code is often manipulating the reference count of Python objects, e.g. via the `Py_DECREF` macro. But when a reference count reaches 0, this might run a `__del__` function implemented in Python, which means the "between bytecode instructions" thread switch can happen inside that reference-counting-operation. That's a lot of possible places!

Even more fun: allocating memory could trigger Python's garbage collector which would also run `__del_-` functions. So every allocation was also a possible (but rare) thread switch.

The GIL was only ever intended to protect Python's internal state (esp. the reference counts themselves); any extension modules assuming that their own state would also be protected were likely already mistaken.

rowanG077•1mo ago

Well I didn't think of this myself. It's literally what the python official doc says:

> A global interpreter lock (GIL) is used internally to ensure that only one thread runs in the Python VM at a time. In general, Python offers to switch among threads only between bytecode instructions; how frequently it switches can be set via sys.setswitchinterval(). Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program.

https://docs.python.org/3/faq/library.html#what-kinds-of-glo...

If this is not the case please let the official python team know their documentation is wrong. It indeed does state that if Py_DECREF is invoked the bets are off. But a ton of operations never do that.

imtringued•1mo ago

You start talking about GIL and then you talk about non-atomic data manipulation, which happen to be completely different things.

The only code that is going to break because of "No GIL" are C extensions and for very obvious reasons: You can now call into C code from multiple threads, which wasn't possible before, but is now. Python code could always be called from multiple python threads even in the presence of the GIL in python.

OskarS•1mo ago

That doesn't match with my understanding of free-threaded Python. The GIL is being replaced with fine-grained locking on the objects themselves, so sharing data-structures between threads is still going to work just fine. If you're talking about concurrency issues like this causing out-of-bounds errors:

    if len(my_list) > 5:
        print(my_list[5])

(i.e. because a different thread can pop from the list in-between the check and the print), that could just as easily happen today. The GIL makes sure that only one python interpreter runs at once, but it's entirely possible that the GIL is released and switches to a different thread after the check but before the print, so there's no extra thread-safety issue in free-threaded mode.

The problems (as I understand it, happy to be corrected), are mostly two-fold: performance and ecosystem. Using fine-grained locking is potentially much less efficient than using the GIL in the single-threaded case (you have to take and release many more locks, and reference count updates have to be atomic), and many, many C extensions are written under the assumption that the GIL exists.

fulafel•1mo ago

A lot of Python usage is leveraging libraries with parallel kernels inside written in other languages. A subset of those is bottlenecked on Python side speed. A sub-subset of those are people who want to try no-GIL to address the bottleneck. But if non-GIL becomes pervasive, it could mean Python becomes less safe for the "just parallel kernels" users.

kccqzy•1mo ago

Yes sure. Thought experiment: what happens when these parallel kernels suddenly need to call back in to Python? Let's say you have a multithreaded sorting library. If you are sorting numbers then fine nothing changes. But if you are sorting objects you need to use a single thread because you need to call PyObject_RichCompare. These new parallel kernels will then try to call PyObject_RichCompare from multiple threads.

monkeyelite•1mo ago

When you launch processes to do work you get multi-core workload balancing for free.

quectophoton•1mo ago

I don't want to add more to your fears, but also remember that LLMs have been trained on decades worth of Python code that assumes the presence of the GIL.

rocqua•1mo ago

This could, indeed, be quite catastrophic.

I wonder if companies will start adding this to their system prompts.

zahlman•1mo ago

Suppose they do. How is the LLM supposed to build a model of what will or won't break without a GIL purely from a textual analysis?

Especially when they've already been force-fed with ungodly amounts of buggy threaded code that has been mistakenly advertised as bug-free simply because nobody managed to catch the problem with a fuzzer yet (and which is more likely to expose its faults in a no-GIL environment, even though it's still fundamentally broken with a GIL)?

dotancohen•1mo ago

As a Python dabbler, what should I be reading to ensure my multi-threaded code in Python is in fact safe.

cess11•1mo ago

The literature on distributed systems is huge. It depends a lot on your use case what you ought to do. If you're lucky you can avoid shared state, as in no race conditions in either end of your executions.

https://www.youtube.com/watch?v=_9B__0S21y8 is fairly concise and gives some recommendations for literature and techniques, obviously making an effort in promoting PlusCal/TLA+ along the way but showcases how even apparently simple algorithms can be problematic as well as how deep analysis has to go to get you a guarantee that the execution will be bug free.

dotancohen•1mo ago

My current concern is a CRUD interface that transcribes audio in the background. The transcription is triggered by user action. I need the "transcription" field disabled until the transcript is complete and stored in the database, then allow the user to edit the transcription in the UI.

Of course, while the transcription is in action the rest of the UI (Qt via Pyside) should remain usable. And multiple transcription requests should be supported - I'm thinking of a pool of transcription threads, but I'm uncertain how many to allocate. Half the quantity of CPUs? All the CPUs under 50% load?

Advise welcome!

realreality•1mo ago

Use `concurrent.futures.ThreadPoolExecutor` to submit jobs, and `Future.add_done_callback` to flip the transcription field when the job completes.

ptx•1mo ago

Although keep in mind that the callback will be "called in a thread belonging to the process" (say the docs), presumably some thread that is not the UI thread. So the callback needs to post an event to the UI thread's event queue, where it can be picked up by the UI thread's event loop and only then perform the UI updates.

I don't know how that's done in Pyside, though. I couldn't find a clear example. You might have to use a QThread instead to handle it.

dotancohen•1mo ago

Thank you. Perhaps I should trigger the transcription thread from the UI thread, then? It is a UI button that initiates it after all.

ptx•1mo ago

The tricky part is coming back onto the UI thread when the background work finishes. Your transcription thread has to somehow trigger the UI work to be done on the UI thread.

It seems the way to do it in Qt is with signals and slots, emitting a signal from your QThread and binding it to a slot in the UI thread, making sure to specify a "queued connection" [1]. There's also a lower-level postEvent method [2] but people disagree [3] on whether that's OK to call from a regular Python thread or has to be called from a QThread.

So I would try doing it with Qt's thread classes, not with concurrent.futures.

[1] https://doc.qt.io/qt-5/threads-synchronizing.html#high-level...

[2] https://doc.qt.io/qt-6/qcoreapplication.html#postEvent

[3] https://www.mail-archive.com/pyqt@riverbankcomputing.com/msg...

dotancohen•1mo ago

Terrific, thank you. You've put me on the right track.

dotancohen•1mo ago

Thank you.

sgarland•1mo ago

Just use multiprocessing. If each job is independent and you aren’t trying to spread it out over multiple workers, it seems much easier and less risky to spawn a worker for each job.

Use SharedMemory to pass the data back and forth.

HDThoreaun•1mo ago

Honestly unless youre willing to devote a solid 4+ hours to learning about multi threading stick with ayncio

dotancohen•1mo ago

I'm willing to invest an afternoon learning. That's been the premise of my entire career!

bayindirh•1mo ago

More realistically, as it happened in ML/AI scene, the knowledgeable people will write the complex libraries and will hand these down to scientists and other less experienced, or risk-averse developers (which is not a bad thing).

With the critical mass Python acquired over the years, GIL becomes a very sore bottleneck in some cases. This is why I decided to learn Go, for example. Properly threaded (and green threaded) programming language which is higher level than C/C++, but lower than Python which allows me to do things which I can't do with Python. Compilation is another reason, but it was secondary with respect to threading.

bgwalter•1mo ago

Knowledgeable people? Pytorch has memory leaks by design, it uses std::shared_ptr for a graph with cycles. It also has threading issues.

jillesvangurp•1mo ago

You are not the only one who is afraid of changes and a bit change resistant. I think the issue here is that the reasons for this fear are not very rational. And also the interest of the wider community is to deal with technical debt. And the GIL is pure technical debt. Defensible 30 years ago, a bit awkward 20 years ago, and downright annoying and embarrassing now that world + dog does all their AI data processing with python at scale for the last 10. It had to go in the interest of future proofing the platform.

What changes for you? Nothing unless you start using threads. You probably weren't using threads anyway because there is little to no point in python to using them. Most python code bases completely ignore the threading module and instead use non blocking IO, async, or similar things. The GIL thing only kicks in if you actually use threads.

If you don't use threads, removing the GIL changes nothing. There's no code that will break. All those C libraries that aren't thread safe are still single threaded, etc. Only if you now start using threads do you need to pay attention.

There's some threaded python code of course that people may have written in python somewhat naively in the hope that it would make things faster that is constantly hitting the GIL and is effectively single threaded. That code now might run a little faster. And probably with more bugs because naive threaded code tends to have those.

But a simple solution to address your fears: simply don't use threads. You'll be fine.

Or learn how to use threads. Because now you finally can and it isn't that hard if you have the right abstractions. I'm sure those will follow in future releases. Structured concurrency is probably high on the agenda of some people in the community.

HDThoreaun•1mo ago

> But a simple solution to address your fears: simply don't use threads. You'll be fine.

Im not worried about new code. Im worried about stuff written 15 years ago by a monkey who had no idea how threads work and just read something on stack overflow that said to use threading. This code will likely break when run post-GIL. I suspect there is actually quite a bit of it.

bgwalter•1mo ago

If it is C-API code: Implicit protection of global variables by the GIL is a documented feature, which makes writing extensions much easier.

Most C extensions that will break are not written by monkeys, but by conscientious developers that followed best practices.

bayindirh•1mo ago

Software rots, software tools evolve. When Intel released performance primitives libraries which required recompilation to analyze multi-threaded libraries, we were amazed. Now, these tools are built into processors as performance counters and we have way more advanced tools to analyze how systems behave.

Older code will break, but they break all the time. A language changes how something behaves in a new revision, suddenly 20 year old bedrock tools are getting massively patched to accommodate both new and old behavior.

Is it painful, ugly, unpleasant? Yes, yes and yes. However change is inevitable, because some of the behavior was rooted in inability to do some things with current technology, and as hurdles are cleared, we change how things work.

My father's friend told me that length of a variable's name used to affect compile/link times. Now we can test whether we have memory leaks in Rust. That thing was impossible 15 years ago due to performance of the processors.

delusional•1mo ago

> Software rots

No it does not. I hate that analogy so much because it leads to such bad behavior. Software is a digital artifact that can does not degrade. With the right attitude, you'd be able to execute the same binary on new machines for as long as you desired. That is not true of organic matter that actually rots.

The only reason we need to change software is that we trade that off against something else. Instructions are reworked, because chasing the universal Turing machine takes a few sacrifices. If all software has to run on the same hardware, those two artifacts have to have a dialogue about what they need from each other.

If we didnt want the universal machine to do anything new. If we had a valuable product. We could just keep making the machine that executes that product. It never rots.

kstrauser•1mo ago

That’s not what the phrase implies. If you have a C program from 1982, you can still compile it on a 1982 operating system and toolchain and it’ll work just as before.

But if you tried to compile it on today’s libc, making today’s syscalls… good luck with that.

Software “rots” in the sense that it has to be updated to run on today’s systems. They’re a moving target. You can still run HyperCard on an emulator, but good luck running it unmodded on a Mac you buy today.

zahlman•1mo ago

> You can still run HyperCard on an emulator, but good luck running it unmodded on a Mac you buy today.

I grew up with HyperCard, so I had a moment of sadness here.

kstrauser•1mo ago

We all have our own personal HyperCard.

dahcryn•1mo ago

yes it does.

If software is implicitly built on wrong understanding, or undefined behaviour, I consider it rotting when it starts to fall apart as those undefined behaviours get defined. We do not need to sacrifice a stable future because of a few 15 year old programs. Let the people who care about the value that those programs bring, manage the update cycle and fix it.

eblume•1mo ago

Software is written with a context, and the context degrades. It must be renewed. It rots, sorry.

igouy•1mo ago

You said it's the context that rots.

bayindirh•1mo ago

It's a matter of perspective, I guess...

When you look from the program's perspective, the context changes and becomes unrecognizable, IOW, it rots.

When you look from the context's perspective, the program changes by not evolving and keeping up with the context, IOW, it rots.

Maybe we anthropomorphize both and say "they grow apart". :)

igouy•1mo ago

We say the context has breaking changes.

We say the context is not backwards compatible.

smilliken•1mo ago

Can you see how this comes off as a pedantic difference? If I ran a program 10 years ago and it worked, then run it today and it doesn't work, we say the program is broken and needs to be updated. We don't say the world around it is broken and needs to revert back to its original state.

igouy•1mo ago

We do revert back to a previous context if that seems practical: revert back to a previous compiler or library version.

indymike•1mo ago

>> Software rots > No it does not.

I'm thankful that it does, or I would have been out of work long ago. It's not that the files change (literal rot), it is that hardware, OSes, libraries, and everything else changes. I'm also thankful that we have not stopped innovating on all of the things the software I write depends on. You know, another thing changes - what we are using the software for. The accounting software I wrote in the late 80s... would produce financial reports that were what was expected then, but would not meet modern GAAP requirements.

rocqua•1mo ago

Fair point, but there is an interesting question posed.

Software doesn't rot, it remains constant. But the context around it changes, which means it loses usefulness slowly as time passes.

What is the name for this? You could say 'software becomes anachronistic'. But is there a good verb for that? It certainly seems like something that a lot more than just software experiences. Plenty of real world things that have been perfectly preserved are now much less useful because the context changed. Consider an Oxen-yoke, typewriters, horse-drawn carriages, envelopes, phone switchboards, etc.

It really feels like this concept should have a verb.

igouy•1mo ago

obsolescence

zahlman•1mo ago

>execute the same binary

Only if you statically compile or don't upgrade your dependencies. Or don't allow your dependencies to innovate.

cestith•1mo ago

My only concern is this kind of change in semantics for existing syntax is more worthy of a major revision than a point release.

rbanffy•1mo ago

It's opt-in at the moment. It won't be the default behavior for a couple releases.

Maybe we'll get Python 4 with no GIL.

/me ducks

necovek•1mo ago

Python already has a history of "misrepresenting" the ycope of the change (like changing behaviour of one of core data types and calling it just a major version change — that's really a new language IMHO).

Still, that's only a marketing move, technically the choice was still the right one, just like this one is.

markkitti•1mo ago

Agreed. This should have been Python 4.

spookie•1mo ago

The other day I compiled a 1989 C program and it did the job.

I wish more things were like that. Tired of building things on shaky grounds.

rbanffy•1mo ago

If you go into mainframes, you'll compile code that was written 50 years ago without issue. In fact, you'll run code that was compiled 50 years ago and all that'll happen is that it'll finish much sooner than it did on the old 360 it originally ran on.

Too•1mo ago

Hello world without -O2 -Werror? I've done several compiler toolchain updates on larger code bases and every new version of Clang, GCC or glibc will trip up new compiler warnings. Worse, occasionally taking advantage of some UB laying around leading to runtime bugs.

I'm not complaining, the new stricter warnings are usually for the better, what I'm saying is that the bedrock of that world isn't as stable as it's sometimes portraited.

zahlman•1mo ago

> A language changes how something behaves in a new revision, suddenly 20 year old bedrock tools are getting massively patched to accommodate both new and old behavior.

In my estimation, the only "20 year old bedrock tools" in Python are in the standard library - which currently holds itself free to deprecate entire modules in any minor version, and remove them two minor versions later - note that this is a pseudo-calver created by a coincidentally annual release cadence. (A bunch of stuff that old was taken out recently, but it can't really be considered "bedrock" - see https://peps.python.org/pep-0594/).

Unless you include NumPy's predecessors when dating it (https://en.wikipedia.org/wiki/NumPy#History). And the latest versions of NumPy don't even support Python 3.9 which is still not EOL.

Requests turns 15 next February (https://pypi.org/project/requests/#history).

Pip isn't 20 years old yet (https://pypi.org/project/pip/#history) even counting the version 0.1 "pyinstall" prototype (not shown).

Setuptools (which generally supports only the Python versions supported by CPython, hasn't supported Python 2.x since version 45 and is currently on version 80) only appears to go back to 2006, although I can't find release dates for versions before what's on PyPI (their own changelog goes back to 0.3a1, but without dates).

actinium226•1mo ago

If code has been unmaintained for more than a few years, it's usually such a hassle to get it working again that 99% of the time I'll just write my own solution, and that's without threads.

I feel some trepidation about threads, but at least for debugging purposes there's only one process to attach to.

dhruvrajvanshi•1mo ago

> Im not worried about new code. Im worried about stuff written 15 years ago by a monkey who had no idea how threads work and just read something on stack overflow that said to use threading. This code will likely break when run post-GIL. I suspect there is actually quite a bit of it.

I was with OP's point but then you lost me. You'll always have to deal with that coworker's shitty code, GIL or not.

Could they make a worse mess with multi threading? Sure. Is their single threaded code as bad anyway because at the end of the day, you can't even begin understand it? Absolutely.

But yeah I think python people don't know what they're asking for. They think GIL less python is gonna give everyone free puppies.

zahlman•1mo ago

>Im worried about stuff written 15 years ago

Please don't - it isn't relevant.

15 years ago, new Python code was still dominantly for 2.x. Even code written back then with an eye towards 3.x compatibility (or, more realistically, lazily run through `2to3` or `six`) will have quite little chance of running acceptably on 3.14 regardless. There have been considerable removals from the standard library, `async` is no longer a valid identifier name (you laugh, but that broke Tensorflow once). The attitude taken towards """strings""" in a lot of 2.x code results in constructs that can be automatically made into valid syntax that appears to preserve the original intent, but which are not at all automatically fixed.

Also, the modern expectation is of a lock-step release cadence. CPython only supports up to the last 5 versions, released annually; and whenever anyone publishes a new version of a package, generally they'll see no point in supporting unsupported Python versions. Nor is anyone who released a package in the 3.8 era going to patch it if it breaks in 3.14 - because support for 3.14 was never advertised anyway. In fact, in most cases, support for 3.9 wasn't originally advertised, and you can't update the metadata for an existing package upload (you have to make a new one, even if it's just a "post-release") even if you test it and it does work.

Practically speaking, pure-Python packages usually do work in the next version, and in the next several versions, perhaps beyond the support window. But you can really never predict what's going to break. You can only offer a new version when you find out that it's going to break - and a lot of developers are going to just roll that fix into the feature development they were doing anyway, because life's too short to backport everything for everyone. (If there's no longer active development and only maintenance, well, good luck to everyone involved.)

If 5 years isn't long enough for your purposes, practically speaking you need to maintain an environment with an outdated interpreter, and find a third party (RedHat seems to be a popular choice here) to maintain it.

dkarl•1mo ago

> What changes for you? Nothing unless you start using threads

Coming from the Java world, you don't know what you're missing. Looking inside an application and seeing a bunch of threadpools managed by competing frameworks, debugging timeouts and discovering that tasks are waiting more than a second to get scheduled on the wrong threadpool, tearing your hair out because someone split a tiny sub-10μs bit of computation into two tasks and scheduling the second takes a hundred times longer than the actual work done, adding a library for a trivial bit of functionality and discovering that it spins up yet another threadpool when you initialize it.

(I'm mostly being tongue in cheek here because I know it's nice to have threading when you need it.)

UltraSane•1mo ago

Just consider that mess job security!

rbanffy•1mo ago

> There's some threaded python code of course

A fairly common pattern for me is to start a terminal UI updating thread that redraws the UI every second or so while one or more background threads do their thing. Sometimes, it’s easier to express something with threads and we do it not to make the process faster (we kind of accept it will be a bit slower).

The real enemy is state that can me mutated from more than one place. As long as you know who can change what, threads are not that scary.

tgv•1mo ago

> Nothing unless you start using threads.

Isn't it also promises/futures? They might start threads implicitly.

jillesvangurp•1mo ago

Why would you use those without threads?

zem•1mo ago

this looks extremely promising https://microsoft.github.io/verona/pyrona.html

freeone3000•1mo ago

I'm sure you'll be happy using the last language that has to fork() in order to thread. We've only had consumer-level multicore processors for 20 years, after all.

im3w1l•1mo ago

You have to understand that people come from very different angles with python. Some people write web servers where in python, where speed equals money saved. Other people write little UI apps that where speed is a complete non-issue. Yet others write aiml code that spends most of its time in gpu code. But then they want to do just a little data massaging in python which can easily bottleneck the whole thing. And some people people write scripts that don't use a .env but rather os-libraries.

monkeyelite•1mo ago

I don’t understand this argument. My python program isn’t the only program on the system - I have a database, web server, etc. It’s already multi-core.

bratao•1mo ago

This is a common mistake and very badly communicated. The GIL do not make the Python code thread-safe. It only protect the internal CPython state. Multi-threaded Python code is not thread-safe today.

amelius•1mo ago

Well, I think you can manipulate a dict from two different threads in Python, today, without any risk of segfaults.

pansa2•1mo ago

You can do so in free-threaded Python too, right? The dict is still protected by a lock, but one that’s much more fine-grained than the GIL.

amelius•1mo ago

Sounds good, yes.

spacechild1•1mo ago

It's memory safe, but it's not necessarily free of race conditions! It's not only C extensions that release the GIL, the Python interpreter itself releases the GIL after a certain number of instructions so that other threads can make progress. See https://docs.python.org/3/library/sys.html#sys.getswitchinte....

Certain operations that look atomic to the user are actually comprised of multiple bytecode instructions. Now, if you are unlucky, the interpreter decides to release the GIL and yield to another thread exactly during such instructions. You won't get a segfault, but you might get unexpected results.

porridgeraisin•1mo ago

Internal cpython state also includes say, a dictionary's internal state. So for practical purposes it is safe. Of course, TOCTOU, stale reads and various race conditions are not (and can never be) protected by the GIL.

kevingadd•1mo ago

This should not have been downvoted. It's true that the GIL does not make python code thread-safe implicitly, you have to either construct your code carefully to be atomic (based on knowledge of how the GIL works) or make use of mutexes, semaphores, etc. It's just memory-safe and can still have races etc.

tialaramex•1mo ago

You're not the only one. David Baron's note certainly applies: https://bholley.net/blog/2015/must-be-this-tall-to-write-mul...

In a language conceived for this kind of work it's not as easy as you'd like. In most languages you're going to write nonsense which has no coherent meaning whatsoever. Experiments show that humans can't successfully understand non-trivial programs unless they exhibit Sequential Consistency - that is, they can be understood as if (which is not reality) all the things which happen do happen in some particular order. This is not the reality of how the machine works, for subtle reasons, but without it merely human programmers are like "Eh, no idea, I guess everything is computer?". It's really easy to write concurrent programs which do not satisfy this requirement in most of these languages, you just can't debug them or reason about what they do - a disaster.

As I understand it Python without the GIL will enable more programs that lose SC.

qznc•1mo ago

Worst case is probably that it is like a "Python4": Things break when people try to update to non-GIL, so they rather stay with the old version for decades.

odiroot•1mo ago

It's called job security. We'll be rewriting decades of code that's broken by that transition.

almostgotcaught•1mo ago

Do you understand what you're implying?

"Python programmers are so incompetent that Python succeeds as a language only because it lacks features they wouldn't know to use"

Even if it's circumstantially true, doesn't mean it's the right guiding principle for the design of the language.

frollogaston•1mo ago

What reliance did you have in mind? All sorts of calls in Python can release the GIL, so you already need locking, and there are race conditions just like in most languages. It's not like JS where your code is guaranteed to run in order until you "await" something.

I don't fully understand the challenge with removing it, but thought it was something about C extensions, not something most users have to directly worry about.

seabrookmx•1mo ago

While it certainly has its rough edges, I'm a big asyncio user. So I'll be over here happily writing concurrent python that's single threaded, ie. pretending my Python is nodejs.

For the web/network workloads most of us write, I'd highly recommend this.

dontlaugh•1mo ago

Asyncio being able to use thread pools would reduce memory usage at the very least.

seabrookmx•1mo ago

How so? As opposed to running multiple processes you mean?

dontlaugh•1mo ago

Exactly, you could do like other runtimes and run a single process that can saturate all cores.

You wouldn’t be duplicating the interpreter, your code, config, etc.

monkeyelite•1mo ago

Good engineering design is about making unbalanced tradeoffs where you get huge wins for low costs. These kinds of decisions are opinionated and require you to say no to some edge cases to get a lot back on the important cases.

One lesson I have learned is that good design cannot survive popularity and bureaucracy that comes with it. Over time people just beat down your door with requests to do cases you explicitly avoided. You’re blocking their work and not being pragmatic! Eventually nobody is left to advocate for them.

And part of that is the community has more resources and can absorb some more complexity. But this is also why I prefer tools with smaller communities.

pawanjswal•1mo ago

This is some serious groundwork for the next era of performance!

pansa2•1mo ago

Does removal of the GIL have any other effects on multi-threaded Python code (other than allowing it to run in parallel)?

My understanding is that the GIL has lasted this long not because multi-threaded Python depends on it, but because removing it:

- Complicates the implementation of the interpreter

- Complicates C extensions, and

- Causes single-threaded code to run slower

Multi-threaded Python code already has to assume that it can be pre-empted on the boundary between any two bytecode instructions. Does free-threaded Python provide the same guarantees, or does it require multi-threaded Python to be written differently, e.g. to use additional locks?

rfoo•1mo ago

> Does free-threaded Python provide the same guarantees

Mostly. Some of the "can be pre-empted on the boundary between any two bytecode instructions" bugs are really hard to hit without free-threading, though. And without free-threading people don't use as much threading stuff. So by nature it exposes more bugs.

Now, my rants:

> have any other effects on multi-threaded Python code

It stops people from using multi-process workarounds. Hence, it simplifies user-code. IMO totally worth it to make the interpreter more complex.

> Complicates C extensions

The alternative (sub-interpreters) complicates C extensions more than free-threading and the top one most important C extension in the entire ecosystem, numpy, stated that they can't and they don't want to support sub-interpreters. On contrary, they already support free-threading today and are actively sorting out remaining bugs.

> Causes single-threaded code to run slower

That's the trade-off. Personally I think a single digit percentage slow-down of single-threaded code worth it.

celeritascelery•1mo ago

> That's the trade-off. Personally I think a single digit percentage slow-down of single-threaded code worth it.

Maybe. I would expect that 99% of python code going forward will still be single threaded. You just don’t need that extra complexity for most code. So I would expect that python code as a whole will have worse performance, even though a handful of applications will get faster.

pphysch•1mo ago

But the bar to parallelizing code gets much lower, in theory. Your serial code got 5% slower but has a direct path to being 50% faster.

And if there's a good free-threaded HTTP server implementation, the RPS of "Python code as a whole" could increase dramatically.

weakfish•1mo ago

Is there any news from FastAPI folks and/or Gunicorn on their support?

fjasdfas•1mo ago

You can do multiple processes with SO_REUSEPORT.

free-threaded makes sense if you need shared state.

pphysch•1mo ago

Any webserver that wants to cache and reuse content cares about shared state, but usually has to outsource that to a shared in-memory database because the language can't support it.

monkeyelite•1mo ago

And most web servers already need in memory databases for other things. And it’s a great design principle - use sharp focused tools.

rfoo•1mo ago

That's the mindset that leads to the funny result that `uv pip` is like 10x faster than `pip`.

Is it because Rust is just fast? Nope. For anything after resolving dependency versions raw CPU performance doesn't matter at all. It's writing concurrent PLUS parallel code in Rust is easier, doesn't need to spawn a few processes and wait for the interpreter to start in each, doesn't need to serialize whatever shit you want to run constantly. So, someone did it!

Yet, there's a pip maintainer who actively sabotages free-threading work. Nice.

notpushkin•1mo ago

> Yet, there's a pip maintainer who actively sabotages free-threading work.

Wow. Could you elaborate?

Doxin•1mo ago

For anyone who hasn't used uv, I feel like 10x faster is an understatement. For cases where packages are already downloaded it's basically instant for any use case I have run into.

foresto•1mo ago

As I recall, CPython has also been getting speed-ups lately, which ought to make up for the minor single-threaded performance loss introduced by free threading. With that in mind, the recent changes seem like an overall win to me.

celeritascelery•1mo ago

It’s not either/or. The CPython speedups would be even better with the single threaded interpreter.

foresto•1mo ago

Nobody has suggested otherwise.

oconnor663•1mo ago

Sure but of those 99%, how many are performance-sensitive, CPU-bound (in Python not in C) applications? It's clearly some, not saying it's an easy tradeoff, but I assume the large majority of Python programs out there won't notice a slowdown.

rocqua•1mo ago

Note that there is an entire order of magnitude range for a 'single digit'.

A 1% slowdown seems totally fine. A 9% slowdown is pretty bad.

monkeyelite•1mo ago

If so, then why use python?

smilliken•1mo ago

I've seen benchmarks that estimate the regression at 20-30%, though I expect there's large variance depending on what a program's bottleneck is.

jacob019•1mo ago

Your understanding is correct. You can use all the cores but it's much slower per thread and existing libraries may need to be reworked. I tried it with PyTorch, it used 10x more CPU to do half the work. I expect these issues to improve, still great to see after 20 years wishing for it.

btilly•1mo ago

It makes race conditions easier to hit, and that will require multi-threaded Python to be written with more care to achieve the same level of reliability.

monkeyelite•1mo ago

Yes it makes every part of the ecosystem more complex and prone to bugs in hopes of getting more performance in a scripting language.

heybrendan•1mo ago

I am a Python user, but far from an expert. Occasionally, I've used 'concurrent.futures' to kick off running some very simple functions, at the same time.

How are 'concurrent.futures' users impacted? What will I need to change moving forward?

rednafi•1mo ago

It’s going to get faster since threads won’t be locked on GIL. If you’re locking shared objects correctly or not using them all, then you should be good.

0x000xca0xfe•1mo ago

I know it's just an AI image... but a snake with two tails? C'mon!

brookst•1mo ago

Confusoborus

vpribish•1mo ago

shh. don't complain too loudly or we'll lose an important tell. python articles using snake illustrations can usually be ignored because they are not clueful.

-- python, monty

bgwalter•1mo ago

This is just an advertisement for the company. Fact is, free-threading is still up to 50% slower, the tail call interpreter isn't much faster at all, and free-threading is still flaky.

Things they won't tell you at PyCon.

tomrod•1mo ago

QuantSight isn't a formal company though, it's a skunkworks/OSS research group run by the Travis Oliphant.

lenerdenator•1mo ago

I don't see how any of that's a problem given that it's not the default for how people run Python.

It's a big project that's going to take lots of time by lots of people to finish. Keep it behind opt-in, keep accepting pull requests after rigorous testing, and it's fine.

pjmlp•1mo ago

On the other news, Microsoft dumped the whole faster Python team, apparently the 2025 earnings weren't enough to keep the team around.

https://www.linkedin.com/posts/mdboom_its-been-a-tough-coupl...

Lets see whatever performance improvements still land on CPython, unless other company sponsors the work.

I guess Facebook (no need to correct me on the name) is still sponsoring part of it.

falcor84•1mo ago

It wouldn't have bothered me if you just said "Facebook" - I probably wouldn't have even noticed it. But I'm really curious why you chose to write "Facebook", then apparently noticed the issue, and instead of replacing it with "Meta" decided to add the much longer "(no need to correct me on the name)". What axe are you grinding?

pjmlp•1mo ago

Yes, because I am quite certain someone without anything better to do would correct me on that.

For me Facebook will always be Facebook, and Twitter will always be Twitter.

falcor84•1mo ago

> Yes, because I am quite certain someone without anything better to do would correct me on that.

Well, you sure managed to avoid that by setting up camp on that hill. Kudos on so much time saved.

> For me Facebook will always be Facebook, and Twitter will always be Twitter.

Well, for me the product will always be "Thefacebook", but that's since I haven't used it since. But I do respect that there's a company running it now that does more stuff and contributes to open source projects.

biorach•1mo ago

> Well, you sure managed to avoid that by setting up camp on that hill. Kudos on so much time saved.

Why are you picking a fight about this?

falcor84•1mo ago

I think I'm taking it personally because I had previously changed my name and had people repeatedly call me by my old name just to annoy/hurt me.

Obviously I know that companies aren't people and don't have feelings, but I can't understand why you would intentionally avoid using their chosen name, even when it's more effort to you.

kstrauser•1mo ago

I wouldn’t do that to a person. I’m not worried about hurting Twitter’s feelings, though.

Flamentono2•1mo ago

Oh in this particular case i do this on purpose too.

To make it very clear that people don't forget were the Money is coming from.

Its Facebook, its Facebook money

Flamentono2•1mo ago

With money which destroied our society

rbanffy•1mo ago

> Twitter will always be Twitter.

If Elon can deadname his daughter, then we can deadname his company.

kstrauser•1mo ago

That’s the rationale I’ve been using.

rich_sasha•1mo ago

Ah that's very, very sad. I guess they have embraced and extended, there's only one thing left to do.

biorach•1mo ago

At this stage the cliched and clueless comments about embrace/extend/extinguish are tiresome and inevitable whenever Microsoft is mentioned.

A few decades ago MS did indeed have a playbook which they used to undermine open standards. Laying off some members of the Python team bears no resemblence whatsoever to that. At worst it will delay the improvement of free-threaded Python. That's all.

Your comment is lazy and unfounded.

kstrauser•1mo ago

cough Bullshit cough

* VSCode got popular and they started preventing forks from installing its extensions.

* They extended the Free Source pyright language server into the proprietary pylance. They don’t even sell it. It’s just there to make the FOSS version less useful.

* They bought GitHub and started rate limiting it to unlogged in visitors.

Every time Microsoft touches a thing, they end up locking it down. They can’t help it. It’s their nature. And if you’re the frog carrying that scorpion across the pond and it stings you, well, you can only blame it so much. You knew this when they offered the deal.

Every time. It hasn’t changed substantially since they declared that Linux is cancer, except to be more subtle in their attacks.

oblio•1mo ago

I actually hate this trope more because of what is says about the poster. Which I guess would, that they're someone wearing horse blinders.

There's a part of me that wants to scream at them:

"Look around you!!! It's not 1999 anymore!!! These days we have Google, Amazon, Apple, Facebook, etc, which are just as bad if not worse!!! Cut it out with the 20+ year old bad jokes!!!"

Yes, Microsoft is bad. The reason Micr$oft was the enemy back in the day is because they... won. They were bigger than anyone else in the fields that mattered (except for server-side, where they almost one). Now they're just 1 in a gang of evils. There's nothing special about them anymore. I'm more scared of Apple and Google.

kstrauser•1mo ago

That’s only reasonable if you believe you can only distrust one company at a time. I distrust every one you mentioned there, for different reasons, in different ways. I don’t think that Apple is trying to exclusively own the field of programming tools to their own profit, nor do I think that Facebook is. I don’t think Apple is trying to own all data about every human. I don’t think Microsoft is trying to force all vendors to sell through their app store.

But the thing is that Microsoft hasn’t seemed to fundamentally change since 1999. They appear kinder and friendlier but they keep running the same EEE playbook everywhere they can. Lots of us give them a free pass because they let us run a nifty free-for-now programming editor. That doesn’t change the leopard’s spots, though.

mixmastamyk•1mo ago

All these posts and no one mentioned their numerous, recent, abusive deeds around Windows or negligent security posture, all the while having captured Uncle Sam and other governments.

MS has continued to metastasize and is in some ways worse than the old days, even if they’ve finally accepted the utility of open source as a loss leader.

They have the only BigTech products I’ve been forced to use if I want to eat.

oblio•1mo ago

Yet I only ever see these tired EEE memes for Microsoft when Chrome is basically the web, for example.

kstrauser•1mo ago

I don’t know what to tell you, except that you obviously haven’t read a lot of my stuff on that topic. (Not that I would expect anyone to have, mind you. I’m nobody.) I agree with you. I only use Chrome when I must, like when I’m updating a Meshtastic radio and the flasher app doesn’t run on Firefox or Safari.

I’m not anti-MS as much as anti their behavior, whoever is acting that way. This thread is directly related to MS so I’m expressing my opinion on MS here. I’ll be more than happy to share my thoughts on Chrome in a Google thread.

biorach•1mo ago

None of those were independent projects or open standards. VScode and pyright are both MS projects from the get-go.

Sabotaging forks is scummy, but the forks were extending MS functionality, not the other way around.

GitHub was a private company before it was bought by MS. Rate limiting is.... not great, but certainly not an extinguish play.

EEE refers to the subversion of open standards or independent free software projects. It does not apply to any of the above.

MS are still scummy but at least attack them on their own demerits, and don't parrot some schtick from decades ago.

kstrauser•1mo ago

It’s not just EEE, though. They have a history of getting devs all in on a thing and then killing it with corporate-grade ADHD. They bought Visual FoxPro, got bored with it, and told everyone to rewrite into Visual Basic (which they then killed). Then the future was Silverlight, until it wasn’t. There are a thousand of these things that weren’t deliberately evil in the EEE, but defined the word rugpull before we called it that.

So even without EEE, I think it’s supremely risky to hitch your wagon to their tech or services (unless you’re writing primarily for Windows, which is what they’d love to help you migrate to). And I can’t be convinced the GitHub acquisition wasn’t some combination of these dark patterns.

Step 1: Get a plurality of the world’s FOSS into one place.

Step 2: Feed it into a LLM and then embed it in a popular free editor so that everyone can use GPL code without actually having to abide the license.

Step 3: Make it increasingly hard to use for FOSS development by starting to add barriers a little at a time. <= we are here

As a developer, they’ve done nothing substantial to earn my trust. I think a lot of Microsoft employees are good people who don’t subscribe to all this and who want to do the right thing, but corporate culture just won’t let that be.

biorach•1mo ago

> I think it’s supremely risky to hitch your wagon to their tech or services

OK, finally, yes, this is very true, for specific parts of their tech.

But banging on about EEE just distracts from this, more important message.

> Make it increasingly hard to use for FOSS development by starting to add barriers a little at a time. <= we are here

....and now you've lost me again

kstrauser•1mo ago

Note I wasn’t the one who said EEE upstream. I was just replying to the thread.

Hanlon’s razor is a thing, and I generally follow it. It’s just that I’ve seen Microsoft make so many “oops, our bad!” mistakes over the years that purely coincidentally gave them an edge up over their competition, that I tend to distrust such claims from them.

I don’t feel that way about all corps. Oracle doesn’t make little mistakes that accidentally harm the competition while helping themselves. No, they’ll look you in the eye and explain that they’re mugging you while they take your wallet. It’s kind of refreshingly honest in its own way.

dhruvrajvanshi•1mo ago

> Oracle doesn’t make little mistakes that accidentally harm the competition while helping themselves. No, they’ll look you in the eye and explain that they’re mugging you while they take your wallet. It’s kind of refreshingly honest in its own way.

Fucking hell bud :D

kstrauser•1mo ago

Tell me I'm wrong! :D

nyanpasu64•1mo ago

I'm more upset that Microsoft is charging money for using a code generation model trained on copyleft code.

stusmall•1mo ago

That shows a misunderstanding of what EEE was. This team was sending changes upstream which is the exact opposite of "extend" step of the strategy. The idea of "extend" was to add propriety extensions on top of an open standard/project locking customers into the MSFT implementation.

jerrygenser•1mo ago

Ok so a better example of what you describe might be vscode.

nothrabannosir•1mo ago

What existing open standard did vscode Embrace? I thought Microsoft created v0 themselves.

A classic example is ActiveX.

biorach•1mo ago

> A classic example is ActiveX.

Nah, even that was based on earlier MS technologies - OLE and COM

A good starter list of EEE plays is on the wikipedia page: https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguis...

nothrabannosir•1mo ago

Funny you linked that page because that’s where I got activex from :D

> Examples by Microsoft

> Browser incompatibilities

> The plaintiffs in an antitrust case claimed Microsoft had added support for ActiveX controls in the Internet Explorer Web browser to break compatibility with Netscape Navigator, which used components based on Java and Netscape's own plugin system.

biorach•1mo ago

ah ok, sorry. I thought you were saying that they tried an EEE play on ActiveX.

You meant they used ActiveX in an EEE play in the browser wars.

nothrabannosir•1mo ago

Honestly I kept it vague because I didn't actually know so your call-out was totally valid. I know it better now than without your clarification so thanks :+1:

JacobHenner•1mo ago

VSCode displaced Atom, pre-GitHub acquisition, by building on top of Atom's rendering engine Electron.

nyanpasu64•1mo ago

Microsoft "embraced" open-source ecosystems with an "open-source" editor, extended it with proprietary extensions DRMed to binary blobs hidden in VS Code binary builds, and used it to extinguish SSH, Python, C++, etc. development in open-source and derivative works of VS Code.

bgwalter•1mo ago

They were quite a bit behind the schedule that was promised five years ago.

Additionally, at this stage the severe political and governance problems cannot have escaped Microsoft. I imagine that no competent Microsoft employee wants to give his expertise to CPython, only later to suffer group defamation from a couple of elected mediocre people.

CPython is an organization that overpromises, allocates jobs to the obedient and faithful while weeding out competent dissenters.

It wasn't always like that. The issues are entirely self-inflicted.

biorach•1mo ago

> CPython is an organization that overpromises, allocates jobs to the obedient and faithful while weeding out competent dissenters.

This stinks of BS

wisty•1mo ago

It sounds like an oblique reference to that time they temporarily suspended one of the of the most valuable members of the community, apparently for having the audacity to suggest that their powers to suspend members of the community seemed a little arbitrary and open to abuse.

biorach•1mo ago

Well they could just say that instead of wasting people's time with oblique references

robertlagrant•1mo ago

Saying "This stinks of BS" is going to mean you have little standing to criticise other people for wasting time.

kragen•1mo ago

Temporarily? Tim Peters got reinstated?

kragen•1mo ago

Apparently so: https://lwn.net/Articles/1002340/

make3•1mo ago

Microsoft also fired a whole lot of other open source people unrelated to Python in this current layoff

pjmlp•1mo ago

Notably MAUI, ASP.NET, Typescript and AI frameworks.

vlovich123•1mo ago

That’s unfortunate but I called it when people were claiming that Microsoft had committed to this effort for the long term.

mtzaldo•1mo ago

Could we do a crowdfunding campaign so we can keep paying them? The whole world is/will benefit from their work.

morkalork•1mo ago

Didn't Google lay off their entire Python development team in the last year as well? I wonder if there is some impetus behind both.

make3•1mo ago

doesn't print money right away = cut by executive #3442

monkeyelite•1mo ago

This is a story from a few years ago, and they obviously have lots of teams that use python. This team was an internal support for python and tools - which honestly sounds exactly like the kind of thing that would get cut in a pinch (no value judgement).

bgwalter•1mo ago

No, that story is from last year.

amelius•1mo ago

The snake in the header image appears to have two tail-ends ...

cestith•1mo ago

I guess it’s spawned a second thread in the same process.

sgarland•1mo ago

> Instead, many reach for multiprocessing, but spawning processes is expensive

Agreed.

> and communicating across processes often requires making expensive copies of data

SharedMemory [0] exists. Never understood why this isn’t used more frequently. There’s even a ShareableList which does exactly what it sounds like, and is awesome.

[0]: https://docs.python.org/3/library/multiprocessing.shared_mem...

ogrisel•1mo ago

You cannot share arbitrarily structured objects in the `ShareableList`, only atomic scalars and bytes / strings.

If you want to share structured Python objects between instances, you have to pay the cost of `pickle.dump/pickle.dump` (CPU overhead for interprocess communication) + the memory cost of replicated objects in the processes.

tomrod•1mo ago

I can fit a lot of json into bytes/strings though?

cjbgkagh•1mo ago

Perhaps flatbuffers would be better?

tomrod•1mo ago

I love learning from folks on HN -- thanks! Will check it out.

notpushkin•1mo ago

Take a look at https://capnproto.org/ as well, while at it.

Neither solve the copying problem, though.

frollogaston•1mo ago

Ah, I forgot capnproto doesn't let you edit a serialized proto in-memory, it's read-only. In theory this should be possible as long as you're not changing the length of anything, but I'm not surprised such trickery is unsupported.

So this doesn't seem like a versatile solution for sharing data structs between two Python processes. You're gonna have to reserialize the whole thing if one side wants to edit, which is basically copying.

tinix•1mo ago

let me introduce you to quickle.

vlovich123•1mo ago

That’s even worse than pickle.

tomrod•1mo ago

pickle pickles to pickle binary, yeah? So can stream that too with an io Buffer :D

frollogaston•1mo ago

If all your state is already json-serializable, yeah. But that's just as expensive as copying if not more, hence what cjbgkagh said about flatbuffers.

frollogaston•1mo ago

oh nvm, that doesn't solve this either

reliabilityguy•1mo ago

What’s the point? The whole idea is to share an object, and not to serialize them whether it’s json, pickle, or whatever.

tomrod•1mo ago

I mean, the answer to this is pretty straightforward -- because we can, not because we should :)

notpushkin•1mo ago

We need a dataclass-like interface on top of a ShareableList.

notpushkin•1mo ago

Actually, ShareableList feels like a tuple really (as it’s impossible to change its length). If we could mix ShareableList and collections.namedtuple together, it would get us 90% there (99.9% if we use typing.NamedTuple). Unfortunately, I can’t decipher either one [1, 2] from the first glance – maybe if I get some more sleep?

[1]: https://github.com/python/cpython/blob/3.13/Lib/collections/...

[2]: https://github.com/python/cpython/blob/3.13/Lib/typing.py#L2...

sgarland•1mo ago

So don’t do that? Send data to workers as primitives, and have a separate process that reads the results and serializes it into whatever form you want.

modeless•1mo ago

Yeah I've had great success sharing numpy arrays this way. Explicit sharing is not a huge burden, especially when compared with the difficulty of debugging problems that occur when you accidentally share things between threads. People vastly overstate the benefit of threads over multiprocessing and I don't look forward to all the random segfaults I'm going to have to debug after people start routinely disabling the GIL in a library ecosystem that isn't ready.

I wonder why people never complained so much about JavaScript not having shared-everything threading. Maybe because JavaScript is so much faster that you don't have to reach for it as much. I wish more effort was put into baseline performance for Python.

dhruvrajvanshi•1mo ago

> I wonder why people never complained so much about JavaScript not having shared-everything threading. Maybe because JavaScript is so much faster that you don't have to reach for it as much. I wish more effort was put into baseline performance for Python.

This is a fair observation.

I think a part of the problem is that the things that make GIL less python hard are also the things that make faster baseline performance hard. I.e. an over reliance of the ecosystem on the shape of the CPython data structures.

What makes python different is that a large percentage of python code isn't python, but C code targeting the CPython api. This isn't true for a lot of other interpreted languages.

com2kid•1mo ago

Nobody sane tries to do math in JS. Backend JS is recommended for situations where processing is minimal and it is mostly lots of tiny IO requests that need to be shunted around.

I'm a huge JS/Node proponent and if someone says they need to write a backend service that crunches a lot of numbers, I'll recommend choosing a different technology!

For some reason Python peeps keep trying to do actual computations in Python...

frollogaston•1mo ago

Python peeps tend to do heavy numbers calc in numpy, but sometimes you're doing expensive things with dictionaries/lists.

dragonwriter•1mo ago

> For some reason Python peeps keep trying to do actual computations in Python...

Mostly, Python peeps do heavy calculation in not-really-Python (even if it is embedded in and looks like Python), e.g., via numpy, numba, taichi, etc.

zahlman•1mo ago

> I wish more effort was put into baseline performance for Python.

There has been. That's why the bytecode is incompatible between minor versions. It was a major selling(?) point for 3.11 and 3.12 in particular.

But the "Faster CPython" team at Microsoft was apparently just laid off (https://www.linkedin.com/posts/mdboom_its-been-a-tough-coupl...), and all of the optimization work has to my understanding been based around fairly traditional techniques. The C part of the codebase has decades of legacy to it, after all.

Alternative implementations like PyPy often post impressive results, and are worth checking out if you need to worry about native Python performance. Not to mention the benefits of shifting the work onto compiled code like NumPy, as you already do.

csense•1mo ago

Yeah, when I'm having Python performance issues, my first instinct is to reach for Pypy. My second instinct is to rewrite the "hot" part in C or Rust.

frollogaston•1mo ago

"I wonder why people never complained so much about JavaScript not having shared-everything threading"

Mainly cause Python is often used for data pipelines in ways that JS isn't, causing situations where you do want to use multiple CPU cores with some shared memory. If you want to use multiple CPU cores in NodeJS, usually it's just a load-balancing webserver without IPC and you just use throng, or maybe you've got microservices.

Also, JS parallelism simply excelled from the start at waiting on tons of IO, there was no confusion about it. Python later got asyncio for this, and by now regular threads have too much momentum. Threads are the worst of both worlds in Py, cause you get the overhead of an OS thread and the possibility of race conditions without the full parallelism it's supposed to buy you. And all this stuff is confusing to users.

monkeyelite•1mo ago

> I wonder why people never complained so much about JavaScript not having shared-everything threading

Because it greatly simplifies the language and gives you all kinds of invariants.

chubot•1mo ago

Spawning processes generally takes much less than 1 ms on Unix

Spawning a PYTHON interpreter process might take 30 ms to 300 ms before you get to main(), depending on the number of imports

It's 1 to 2 orders of magnitude difference, so it's worth being precise

This is a fallacy with say CGI. A CGI in C, Rust, or Go works perfectly well.

e.g. sqlite.org runs with a process PER REQUEST - https://news.ycombinator.com/item?id=3036124

Sharlin•1mo ago

Unix is not the only platform though (and is process creation fast on all Unices or just Linux?) The point about interpreter init overhead is, of course, apt.

btilly•1mo ago

Process creation should be fast on all Unices. If it isn't, then the lowly shell script (heavily used in Unix) is going to perform very poorly.

kragen•1mo ago

While I think you've been using Unix longer than I have, shell scripts are known for performing very poorly, and on PDP-11 Unix (where perhaps shell scripts were most heavily used, since Perl didn't exist yet) fork() couldn't even do copy-on-write; it had to literally copy the process's entire data segment, which in most cases also contained a copy of its code. Moving to paged machines like the VAX and especially the 68000 family made it possible to use copy-on-write, but historically speaking, Linux has often been an order of magnitude faster than most other Unices at fork(). However, I think people mostly don't use those Unices anymore. I imagine the BSDs have pretty much caught up by now.

https://news.ycombinator.com/item?id=44009754 gives some concrete details on fork() speed on current Linux: 50μs for a small process, 700μs for a regular process, 1300μs for a venti Python interpreter process, 30000–50000μs for Python interpreter creation. This is on a CPU of about 10 billion instructions per second per core, so forking costs on the order of ½–10 million instructions.

fredoralive•1mo ago

Python runs on other operating systems, like NT, where AIUI processes are rather more heavyweight.

Not all use cases of Python and Windows intersect (how much web server stuff is a Windows / IIS / SQL Server / Python stack? Probably not many, although WISP is a nice acronym), but you’ve still got to bear it in mind for people doing heavy numpy stuff on their work laptop or whatever.

saagarjha•1mo ago

Yes, this is why using shell scripts on macOS is miserable

bobmcnamara•1mo ago

Before we had MMUs and CoW, it was awful slow

charleshn•1mo ago

> Spawning processes generally takes much less than 1 ms on Unix

It depends on whether one uses clone, fork, posix_spawn etc.

Fork can take a while depending on the size of the address space, number of VMAs etc.

crackez•1mo ago

Fork on Linux should use copy-on-write vmpages now, so if you fork inside python it should be cheap. If you launch a new Python process from let's say the shell, and it's already in the buffer cache, then you should only have to pay the startup CPU cost of the interpreter, since the IO should be satisfied from buffer cache...

charleshn•1mo ago

> Fork on Linux should use copy-on-write vmpages now, so if you fork inside python it should be cheap.

No, that's exactly the point I'm making, copying PTEs is not cheap on a large address space, woth many VMAs.

You can run a simple python script allocating a large list and see how it affects fork time.

charleshn•1mo ago

See e.g. https://www.alibabacloud.com/blog/async-fork-mitigating-quer...

knome•1mo ago

for glibc and linux, fork just calls clone. as does posix_spawn, using the flag CLONE_VFORK.

LPisGood•1mo ago

My understanding is that spawning a thread takes just a few micro seconds, so whether you’re talking about a process or a Python interpreter process there are still orders of magnitude to be gained.

kragen•1mo ago

To be concrete about this, http://canonical.org/~kragen/sw/dev3/forkovh.c took 670μs to fork, exit, and wait on the first laptop I tried it on, but only 130μs compiled with dietlibc instead of glibc, and with glibc on a 2.3 GHz E5-2697 Xeon, it took 130μs compiled with glibc.

httpdito http://canonical.org/~kragen/sw/dev3/server.s (which launches a process per request) seems to take only about 50μs because it's not linked with any C library and therefore only maps 5 pages. Also, that doesn't include the time for exit() because it runs multiple concurrent child processes.

On this laptop, a Ryzen 5 3500U running at 2.9GHz, forkovh takes about 330μs built with glibc and about 130–140μs built with dietlibc, and `time python3 -c True` takes about 30000–50000μs. I wrote a Python version of forkovh http://canonical.org/~kragen/sw/dev3/forkovh.py and it takes about 1200μs to fork(), _exit(), and wait().

If anyone else wants to clone that repo and test their own machines, I'm interested to hear the results, especially if they aren't in Linux. `make forkovh` will compile the C version.

1200μs is pretty expensive in some contexts but not others. Certainly it's cheaper than spawning a new Python interpreter by more than an order of magnitude.

kragen•1mo ago

On my cellphone forkovh is 700μs and forkovh.py is 3700μs. Qualcomm SDM630. All the forkovh numbers are with 102400 bytes of data.

jaoane•1mo ago

>Spawning a PYTHON interpreter process might take 30 ms to 300 ms before you get to main(), depending on the number of imports

That's lucky. On constrained systems launching a new interpreter can very well take 10 seconds. Python is ssssslllloooowwwww.

morningsam•1mo ago

>Spawning a PYTHON interpreter process might take 30 ms to 300 ms

Which is why, at least on Linux, Python's multiprocessing doesn't do that but fork()s the interpreter, which takes low-single-digit ms as well.

zahlman•1mo ago

Even when the 'spawn' strategy is used (default on Windows, and can be chosen explicitly on Linux), the overhead can largely be avoided. (Why choose it on Linux? Apparently forking can cause problems if you also use threads.) Python imports can be deferred (`import` is a statement, not a compiler or pre-processor directive), and child processes (regardless of the creation strategy) name the main module as `__mp_main__` rather than `__main__`, allowing the programmer to distinguish. (Being able to distinguish is of course necessary here, to avoid making a fork bomb - since the top-level code runs automatically and `if __name__ == '__main__':` is normally top-level code.)

But also keep in mind that cleanup for a Python process also takes time, which is harder to trace.

Refs:

https://docs.python.org/3/library/multiprocessing.html#conte... https://stackoverflow.com/questions/72497140

kstrauser•1mo ago

I really wish Python had a way to annotate things you don't care about cleaning up. I don't know what the API would look like, but I imagine something like:

  l = list(cleanup=False)
  for i in range(1_000_000_000): l.append(i)

telling the runtime that we don't need to individually GC each of those tiny objects and just let the OS's process model free the whole thing at once.

Sure, close TCP connections before you kill the whole thing. I couldn't care less about most objects, though.

zahlman•1mo ago

You'd presumably need to do something involving weakrefs, since it would be really bad if you told Python that the elements can be GCd at all (never mind whether it can be done all at once) but someone else had a reference.

Or completely rearchitect the language to have a model of automatic (in the C sense) allocation. I can't see that ever happening.

kstrauser•1mo ago

I don't think either of those are true. I'm not arguing against cleaning up objects during the normal runtime. What I'd like is something that would avoid GC'ing objects one-at-a-time at program shutdown.

I've had cases where it took Python like 30 seconds to exit after I'd slurped a large CSV with a zillion rows into RAM. At that time, I'd dreamed of a way to tell Python not to bother free()ing any of that, just exit() and let Linux unmap RAM all at once. If you think about it, there probably aren't that many resources you actually care about individually freeing on exit. I'm certain somewill will prove me wrong, but at a first pass, objects that don't define __del__ or __exit__ probably don't care how you destroy them.

zahlman•1mo ago

Ah.

I imagine the problem is that `__del__` could be monkeypatched, so Python doesn't strictly know what needs custom finalization until that moment.

But if you have a concrete proposal, it's likely worth shopping around at https://discuss.python.org/c/ideas/6 or https://github.com/python/cpython/issues/ .

kstrauser•1mo ago

I might do that. It’s nothing I’ve thought about in depth, just an occasionally recurring idea that bugs me every now and then.

duped•1mo ago

Tbh if you're optimizing python code you've already lost

kstrauser•1mo ago

Run along.

kragen•1mo ago

On a 64-core machine, Python code that uses all the cores will be modestly faster than single-threaded C, even if all the inner loops are in Python. If you can move the inner loops to C, for example with Numpy, you can do much better still. (Python is still harder to get right than something like C or OCaml, of course, especially for larger programs, but often the smaller amount of code and quicker feedback loop can compensate for that.)

duped•1mo ago

I strongly doubt this claim. Python is more than 64x slower than C without synchronization overhead in most numeric tasks, with synchronization overhead on those processes it should be much worse.

Python is so much slower than any native or JIT compiled language that it begets things like numpy in the first place.

kragen•1mo ago

My typical experience is about 40×.

Izkata•1mo ago

There's already a global:

  import gc
  gc.disable()

So I imagine putting more in there to remove objects from the tracking.

kstrauser•1mo ago

That can go a long way, so long as you remember to manually GC the handful of things you do care about.

MonkeyClub•1mo ago

And then we're back to manual memory management.

At least the objects get instantiated automatically, and you don't need to malloc() them into existence yourself; I guess that's still something.

westurner•1mo ago

Is there a good way to add __del__() methods or to wrap Context Manager __enter__()/__exit__() methods around objects that never needed them because of the gc?

Hadn't seen this:

  import gc
  gc.disable()

Cython has __dealloc__() instead of __del__()?

westurner•1mo ago

Also, there's a recent proposal to add explicit resource management to JS: "JavaScript's New Superpower: Explicit Resource Management" https://news.ycombinator.com/item?id=44012227

Too•1mo ago

Never experienced this. If this is truly a problem, here is a sledgehammer, just beware it will not close your tcp connections gracefully: os.kill(os.getpid(), signal.SIGKILL).

codethief•1mo ago

> Which is why, at least on Linux, Python's multiprocessing doesn't do that but fork()s the interpreter

…which can also be a great source of subtle bugs if you're writing a cross-platform application.

ori_b•1mo ago

As another example: I run https://shithub.us with shell scripts, serving a terabyte or so of data monthly (mostly due to AI crawlers that I can't be arsed to block).

I'm launching between 15 and 3000 processes per request. While Plan 9 is about 10x faster at spawning processes than Linux, it's telling that 3000 C processes launching in a shell is about as fast as one python interpreter.

kstrauser•1mo ago

The interpreter itself is pretty quick:

  ᐅ time echo "print('hi'); exit()" | python
  hi
  
  ________________________________________________________
  Executed in   21.48 millis    fish           external
     usr time   16.35 millis  146.00 micros   16.20 millis
     sys time    4.49 millis  593.00 micros    3.89 millis

ori_b•1mo ago

My machine is marginally faster for that; I get about 17ms doing that with python, without the print:

    time echo "exit()" | python3

    real    0m0.017s
    user    0m0.014s
    sys     0m0.003s

That's... still pretty slow. Here's a C program, run 100 times:

    range=`seq 100`
    time for i in $range; do ./a.out; done

    real    0m0.038s
    user    0m0.024s
    sys     0m0.016s

And finally, for comparison on Plan 9:

   range=`{seq 2000}
   time rc -c 'for(s in $range){ ./6.nop }'

   0.01u 0.09s 0.16r   rc -c for(s in $range){ ./6.nop }

the C program used was simply:

   int main(void) { return 0; }

Of course, the more real work you do in the program, the less it matters -- but there's a hell of a lot you can do in the time it takes Python to launch.

Too•1mo ago

In Python, if you are spawning processes or even threads in a tight loop you have already lost. Use ThreadPoolExecutor or ProcessPoolExecutor from concurrent.futures instead. Then startup time becomes no factor.

seunosewa•1mo ago

You can use a pool of interpreter processes. You don't have to spawn one for each request.

isignal•1mo ago

Processes can die independently so the state of a concurrent shared memory data structure when a process dies while modifying this under a lock can be difficult to manage. Postgres which uses shared memory data structures can sometimes need to kill all its backend processes because it cannot fully recover from such a state.

In contrast, no one thinks about what happens if a thread dies independently because the failure mode is joint.

wongarsu•1mo ago

> In contrast, no one thinks about what happens if a thread dies independently because the failure mode is joint.

In Rust if a thread holding a mutex dies the mutex becomes poisoned, and trying to acquire it leads to an error that has to be handled. As a consequence every rust developer that touches a mutex has to think about that failure mode. Even if in 95% of cases the best answer is "let's exit when that happens".

The operating system tends to treat your whole process as one and shot down everything or nothing. But a thread can still crash in its own due to unhandled oom, assertion failures or any number of other issues

jcalvinowens•1mo ago

> But a thread can still crash in its own due to unhandled oom, assertion failures or any number of other issues

That's not really true on POSIX. Unless you're doing nutty things with clone(), or you actually have explicit code that calls pthread_exit() or gettid()/pthread_kill(), the whole process is always going to die at the same time.

POSIX signal dispositions are process-wide, the only way e.g. SIGSEGV kills a single thread is if you write an explicit handler which actually does that by hand. Unhandled exceptions usually SIGABRT, which works the same way.

** Just to expand a bit: there is a subtlety in that, while dispositions are process-wide, one individual thread does indeed take the signal. If the signal is handled, only that thread sees -EINTR from a blocking syscall; but if the signal is not handled, the default disposition affects all threads in the process simultaneously no matter which thread is actually signalled.

wahern•1mo ago

It would be nice if someday we got per-thread signal handlers to complement per-thread signal masking and per-thread alternate signal stacks.

jcalvinowens•1mo ago

You can sort of get that behavior on Linux using clone(..., ~CLONE_THREAD|~CLONE_SIGHAND|CLONE_VM, ...), which creates otherwise distinct processes which share an address space.

You can do all sorts of weird things like create threads which don't share file descriptors, threads which chdir() independently... except that CLONE_THREAD|~CLONE_SIGHAND and CLONE_SIGHAND|~CLONE_VM are disallowed.

oconnor663•1mo ago

I think this is conflating two different things. A Rust Mutex gets poisoned if the thread holding it panics, but that's not the same thing as evaporating into thin air. Destructors run while a panic unwinds (indeed this is how the Mutex poisons itself), and you usually have the option of catching panics if you want. In the panic=abort configuration, where you can't catch a panic, it takes down the whole process rather than just one thread, which is another way of making the same point here: you can't usually kill a thread independently of the whole process its in, because lots of things (like locks) assume you'll never do that.

jcalvinowens•1mo ago

This is a solvable problem though, the literature is overflowing with lock-free implementations of common data structures. The real question is how much performance you have to sacrifice for the guarantee...

tinix•1mo ago

shared memory only works on dedicated hardware.

if you're running in something like AWS fargate, there is no shared memory. have to use the network and file system which adds a lot of latency, way more than spawning a process.

copying processes through fork is a whole different problem.

green threads and an actor model will get you much further in my experience.

bradleybuda•1mo ago

Fargate is just a container runtime. You can fork processes and share memory like you can in any other Linux environment. You may not want to (because you are running many cheap / small containers) but if your Fargate containers are running 0.25 vCPUs then you probably don't want traditional multiprocessing or multithreading...

tinix•1mo ago

Go try it and report back.

Fargate isn't just ECS and plain containers.

You cannot use shared memory in fargate, there is literally no /dev/shm.

See "sharedMemorySize" here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/...

> If you're using tasks that use the Fargate launch type, the sharedMemorySize parameter isn't supported.

sgarland•1mo ago

Well don’t use Fargate, there’s your problem. Run programs on actual servers, not magical serverless bullshit.

tinix•1mo ago

> Well don’t use Fargate, there’s your problem. Run programs on actual servers, not magical serverless bullshit.

That kind of absolutism misses the point of why serverless architectures like Fargate exist. It might feel satisfying, but it closes the door on understanding why stateless and ephemeral workloads exist in the first place.

I get the frustration, but dismissing a production architecture outright ignores the constraints and trade-offs that lead teams to adopt it in the first place. It's worth asking: if so many teams are using this shit in production, at scale, with real stakes, what do they know that might be missing from my current mental model?

Serverless, like any abstraction, isn't magic. It's a tool with defined trade-offs, and resource/process isolation is one of them. If you're running containerized workloads at scale, optimizing for organizational velocity, security boundaries, and multi-tenant isolation, these constraints aren't bullshit, they're literally design parameters and intentional boundaries.

It's easy to throw shade from a distance, but the operational reality of running modern systems, especially in regulated or high-scale environments, looks very different from a home lab or startup sandbox.

perlgeek•1mo ago

> Never understood why this isn’t used more frequently.

Can you throw a JSON-serializable data structure (lists, dict, strings, numbers) into SharedMemory? What about regular instance of random Python classes? If the answer is "no", that explains why it's not done more often.

The examples in the docs seem to pass byte strings and byte arrays around, which is far less convenient than regular data structures.

dragonwriter•1mo ago

> Can you throw a JSON-serializable data structure (lists, dict, strings, numbers) into SharedMemory?

You can throw a JSON-serialized data structure into SharedMemory, sure, since you can store strings.

> The examples in the docs seem to pass byte strings and byte arrays around

The examples in the docs largely use ShareableList, which itself can contain any of int, float, bool, str, bytes, and None-type values.

YouWhy•1mo ago

Hey, I've been developing professionally with Python for 20 years, so wanted to weigh in:

Decent threading is awesome news, but it only affects a small minority of use cases. Threads are only strictly necessary when it's prohibitive to message pass. The Python ecosystem these days includes a playbook solution for literally any such case. Considering the multiple major pitfalls of threads (i.e., locking), they are likely to become a thing useful only in specific libraries/domains and not as a general.

Additionally, with all my love to vanilla Python, anyone who needs to squeeze the juice out of their CPU (which is actually memory bandwidth) has a plenty of other tools -- off the shelf libraries written in native code. (Honorable mention to Pypy, numba and such).

Finally, the one dramatic performance innovation in Python has been async programming - I warmly encourage everyone not familiar with it to consider taking a look.

kstrauser•1mo ago

I haven’t been using it that much longer than you, and I agree with most of what you’re saying, but I’d characterize it differently.

Python has a lot of solid workarounds for avoid threading because until now Python threading has absolutely sucked. I had naively tried to use it to make a CPU-bound workload twice as fast and soon realized the implications of the GIL, so I threw all that code away and made it multiprocessing instead. That sucked in its own way because I had to serialize lots of large data structures to pass around, so 2x the cores got me about 1.5x the speed and a warmer server room.

I would love to have good threading support in Python. It’s not always the right solution, but there are a lot of circumstances where it’d be absolutely peachy, and today we’re faking our way around its absence with whole playbooks of alternative approaches to avoid the elephant in the room.

But yes, use async when it makes sense. It’s a thing of beauty. (Yes, Glyph, we hear the “I told you so!” You were right.)

zahlman•1mo ago

> That sucked in its own way because I had to serialize lots of large data structures to pass around, so 2x the cores got me about 1.5x the speed and a warmer server room.

In many cases you can't reasonably expect better than that (https://en.wikipedia.org/wiki/Amdahl's_law). If your algorithm involves sharing "large data structures" in the first place, that's a bad sign.

kstrauser•1mo ago

That's true, but you can sometimes get a whole lot closer if you can share state between threads. Sometimes you can't help the size of the data. Maybe you have a thread reading frames from a video and passing them to workers for analysis. You might get crazy IO contention if you pass around "foo.vid;frame222" and "foo.vid;frame223" to the workers and make them retrieve that data themselves.

There may be another way to skin that specific cat. My point isn't to solve one specific problem, but to say that some problems are just inherently large. And with Python, today, if those workers are CPU-bound in Python-land, that means running separate processes and passing large hunks of state around (or shoving it through SHM; same idea, just a different way of passing state).

BrokrnAlgorithm•1mo ago

I find python's async to be lacking in fine grained control. It may be fine for 95% of simple use cases, but lacks advanced features such as sequential constraining, task queue memory management, task pre-emption etc. The async keword also tends to bubble up through codebases in aweful ways, making it almost impossible to create reasonably decoupled code.

sylware•1mo ago

Got myself a shiny python 3.13.3 (ssl module still unable to compile with libressl) replacing a 3.12.2, feels clearly slower.

What's wrong?

ipsum2•1mo ago

python 3.13 doesn't ship with free-threaded Python compiled AFAIK.

sylware•1mo ago

You mean it is not default anymore?

ipsum2•1mo ago

It's never been the default.

sylware•1mo ago

huh... then why it feels significantly slower since I did not touch the build conf.

jdsleppy•1mo ago

Did you compile the Python yourself? If so, you may need to add optimization flags https://devguide.python.org/getting-started/setup-building/i...

aitchnyu•1mo ago

Whats currently stopping me (apart from library support) from running a single command that starts up WSGI workers and Celery workers in a single process?

gchamonlive•1mo ago

Nothing, it's just that these aren't first class features of the language. Also someone already explained that the GIL is mostly about technical debt in the CPython interpreter, so there are reasons other than full parallelism to get rid of the GIL.

hello_computer•1mo ago

Opting to enable low-level parallelism for user code in an imperative, dynamically typed scripting language seems like regression. It’s less bad for LISP because of the pure-functional nature. It’s less bad for BEAM languages & Clojure due to immutability. It is less bad for C/C++/Rust because you have a stronger type system—allowing for deeper static analysis. For Python, this is “high priests of a low cult” shitting things up for corporate agendas and/or street cred.

p0w3n3d•1mo ago

Look behind! A free-threaded Python!

EGreg•1mo ago

I thought this was mostly a solved problem.

  Fibers
  Green threads
  Coroutines
  Actors
  Queues (eg GCD)
  …

Basically you need to reason about what your thing will do.

Separate concerns. Each thing is a server (microservice?) with its own backpressure.

They schedule jobs on a queue.

The jobs come with some context, I don’t care if it’s a closure on the heap or a fiber with a stack or whatever. Javascript being single threaded with promises wastefully unwinds the entire stack for each tick instead of saving context. With callbacks you can save context in closures. But even that is pretty fast.

Anyway then you can just load-balance the context across machines. Easiest approach is just to have server affinity for each job. The servers just contain a cache of the data so if the servers fail then their replacements can grab the job from an indexed database. The insertion and the lookup is O(log n) each. And jobs are deleted when done (maybe leaving behind a small log that is compacted) so there are no memory leaks.

Oh yeah and whatever you store durably should be sharded and indexed properly, so practicalkt unlimited amounts can be stored. Availability in a given share is a function of replicating the data, and the economics of it is that the client should pay with credits for every time they access. You can even replicate on demand (like bittorrent re-seeding) to handle spikes.

This is the general framework whether you use Erlang, Go, Python or PHP or whatever. It scales within a company and even across companies (as long as you sign/encrypt payloads cryptographically).

It doesn’t matter so much whether you use php-fpm with threads, or swoole, or the new kid on the block, FrankenPHP. Well, I should say I prefer the shared-nothing architecture of PHP and APC. But in Python, it is the same thing with eg Twisted vs just some SAPI.

You’re welcome.

kccqzy•1mo ago

It's only a mostly solved problem for concurrent I/O heavy workloads. It's not solved in the Python world for parallel CPU-bound workloads.

henry700•1mo ago

I find it peculiar how, in a language so riddled with simple concurrency architectural issues, the approach is to painstankingly fix every library after fixing the runtime, instead of just using some better language. Why does the community insist on such a bad language when literally even fucking Javascript has a saner execution model?

mylons•1mo ago

i find it peculiar how tribal people are about languages. python is fantastic. you're not winning anyone over with comments like this. just go write your javascript and be happy, bud.

forrestthewoods•1mo ago

> instead of just using some better language

Python the language is pretty bad. Python the ecosystem of libraries and tools has no equal, unfortunately.

Switching a language is easy. Switching a billion lines of library less so.

And the tragic part is that many of the top “python libraries” are just Python interfaces to a C library! But if you want to switch to a “better language” that fact isn’t helpful.

kubb•1mo ago

I wonder if we get automatic LLM translation of codebases from language to language soon - this could close the library gap and diminish the language lock in factor.

dash2•1mo ago

I think the opposite. Every language has flaws. What's impressive about Python is their ongoing commitment to work on theirs, even the deepest-rooted. It makes me optimistic that this is a language to stick with for the long run.

rednafi•1mo ago

I agree about using other languages that have better concurrency support if concurrency is your bottleneck.

But changing the language in a brownfield project is hard. I love Go, and these days I don’t bother with Python if I know the backend needs to scale.

But Python’s ecosystem is huge, and for data work, there’s little alternative to it.

With all that said, JavaScript ain’t got shit on any language. The only good thing about it is Google’s runtime, and that has nothing to do with the language. JS doesn’t have true concurrency and is a mess of a language in general. Python is slow, riddled with concurrency problems, but at least it’s a real language created by a guy who knew what he was doing.

make3•1mo ago

I hate how these threads always devolve in insane discussions about why not using threads is better, while most real world people who have tried to do real world speeding up of Python code realize how amazing it would be to have proper threads with shared memory instead of the processes that have so many limitations, like forcing to pickle objects back and forth, & fork so often just not working in the cloud setting, & spawn being so slow in a lot of applications. The usage of processes is just much heavier and less straightforward.

yunnpp•1mo ago

In 2025, 20 years since multi-core is a thing on consumer devices. Great progress, guys. Can't wait for what the Python community has up its sleeves next.

MichaelMoser123•1mo ago

cpython doesn't have a JIT, why is free-threaded python a higher priority than developing a just in time compiler? The later would be more resonant with the typical use case for python and benefit a larger portion of users, wouldn't it? (Wouldn't a backend server project use golang or java to begin with?)

diziet_sma•1mo ago

There have been many attempts to make a Python JIT that is compatible with CPython to various levels of success. However the larger reason is that the gains from removing the GIL far exceeds the gains from a JIT.

If you're writing performance sensitive python code, the "hot" code is likely already in a C-extension, such as Numpy. So there is negligible benefit to running the code with a JIT.

Apple vs the Law

OpenFront: Realtime Risk-like multiplayer game in the browser

Show HN: Pangolin – Open source alternative to Cloudflare Tunnels

Postgres LISTEN/NOTIFY does not scale

LLM Inference Handbook

Batch Mode in the Gemini API: Process More for Less

The ChompSaw: A Benchtop Power Tool That's Safe for Kids to Use

Btrfs Allocator Hints

An almost catastrophic OpenZFS bug and the humans that made it

What is Realtalk’s relationship to AI? (2024)

Woman takes 10x dose of turmeric, gets hospitalized for liver damage

Show HN: Interactive pinout for the Raspberry Pi Pico 2

Series of posts on HTTP status codes (2018)

FOKS: Federated Open Key Service

Flix – A powerful effect-oriented programming language

Show HN: Cactus – Ollama for Smartphones

Apple-1 Computer, handmade by Jobs and Woz [video]

Underwater turbine spinning for 6 years off Scotland's coast is a breakthrough

Graphical Linear Algebra

The Wet History of Media in the Bathroom

Red Hat Technical Writing Style Guide

Show HN: I built a playground to showcase what Flux Kontext is good at

Grok: Searching X for "From:Elonmusk (Israel or Palestine or Hamas or Gaza)"

Show HN: Open source alternative to Perplexity Comet

Orwell Diaries 1938-1942

eBPF: Connecting with Container Runtimes

Measuring the impact of AI on experienced open-source developer productivity

Analyzing database trends through 1.8M Hacker News headlines

Diffsitter – A Tree-sitter based AST difftool to get meaningful semantic diffs

AI coding tools can reduce productivity

Apple vs the Law

OpenFront: Realtime Risk-like multiplayer game in the browser

Show HN: Pangolin – Open source alternative to Cloudflare Tunnels

Postgres LISTEN/NOTIFY does not scale

LLM Inference Handbook

Batch Mode in the Gemini API: Process More for Less

The ChompSaw: A Benchtop Power Tool That's Safe for Kids to Use

Btrfs Allocator Hints

An almost catastrophic OpenZFS bug and the humans that made it

What is Realtalk’s relationship to AI? (2024)

Woman takes 10x dose of turmeric, gets hospitalized for liver damage

Show HN: Interactive pinout for the Raspberry Pi Pico 2

Series of posts on HTTP status codes (2018)

FOKS: Federated Open Key Service

Flix – A powerful effect-oriented programming language

Show HN: Cactus – Ollama for Smartphones

Apple-1 Computer, handmade by Jobs and Woz [video]

Underwater turbine spinning for 6 years off Scotland's coast is a breakthrough

Graphical Linear Algebra

The Wet History of Media in the Bathroom

Red Hat Technical Writing Style Guide

Show HN: I built a playground to showcase what Flux Kontext is good at

Grok: Searching X for "From:Elonmusk (Israel or Palestine or Hamas or Gaza)"

Show HN: Open source alternative to Perplexity Comet

Orwell Diaries 1938-1942

eBPF: Connecting with Container Runtimes

Measuring the impact of AI on experienced open-source developer productivity

Analyzing database trends through 1.8M Hacker News headlines

Diffsitter – A Tree-sitter based AST difftool to get meaningful semantic diffs

AI coding tools can reduce productivity

The first year of free-threaded Python

Comments