The GIL problem won't be solved by throwing engineer hours at it (not even Meta engineers). Fundamentally every single piece of python code ever written will have to stop and now worry about potential race conditions with innocent things like accessing a dictionary item or incrementing a value. It's a massive education and legacy code problem at least on the same scale as the python 2 to 3 migration.
I honestly don't think the community is ready for this change and don't expect to ever see stock cpython drop the GIL--perhaps there will be a flag to selectively operate a python process without the GIL (and leave the onous on you to completely test and validate your code works with potentially new and spooky behaviour).
That is a misconception. The GIL protects the internal state of Python. It don´t make all Python multi-thread code "safe".
The PEP 703 still preserve many actual characteristics such as access and writing to dictionaries.
Correct. Python multi-threaded code is not magically thread-safe. What it did do is make C functions atomic, like dictionary and list methods, since they're implemented by C functions. The reason is the GIL is released and re-acquired while executing Python code (every 100 ops if memory serves), but isn't released by most C functions, making them called with the global lock held, which makes them atomic. You can still have a thread switch between any two Python op codes. e.g. foo += 1 may or may not be atomic depending on how many op codes it compiles to (IIRC, in CPython, it is not atomic.)
One of the reasons I love using gevent (https://www.gevent.org) is that it's way of introducing concurrency that does preserve these atomicity guarantees! Broadly speaking, the only time your code will be interrupted by something else in the same process is if some function call yields control back to the event loop - so if your block is entirely made up of code you control that doesn't do any I/O, you can be sure it will run atomically. And you get it for free without needing to rewrite synchronous code into asyncio.
This does make me wonder, though, if gevent will survive PEP 703's massive changes to internal Python assumptions. That said, gevent does work on pypy, so there's some history with it being flexible enough to port. Hopefully it won't be left behind, otherwise codebases using gevent will see this as a Python 2 -> 3 level apocalypse!
Ah - it's more that you wouldn't typically need to run multiple threads in the same process to handle concurrent requests. For instance, gunicorn used with gevent workers will typically fork processes to ensure all cores are used, but wouldn't require multiple threads per process - gevent would handle concurrency on a single OS thread within each process.
As soon as you leave python, that C extension can give up the GIL and your other python threads can start running.
The GIL makes python code have cooperative threading. It does not protect from e.g. your thread's view of state mutating when you make a database call.
I also believe it is best practice not to mutate data without holding the GIL in extension code, not a requirement - but I have mucked with a lot of different extension API so I might be confused.
No, I wouldn’t call it cooperative threading, it’s still preemptive in that any Python thread can be switched with another at any instruction. That’s the same behavior as the operating system with the same potential for race conditions (except python instructions are higher level than machine code instructions.)
While C extensions can release the GIL, that only makes sense if they do enough work that a Python thread could get some things done in the meanwhile, and it wouldn’t be surprising to the caller. Obviously the C thread can’t interact with the Python world after the GIL has been released.
Having worked on moving a proprietary language from its own green threaded VM to the JVM, and working on TruffleRuby I’m saddened to see this FUD still being trotted out. The GIL and similar mechanisms do not make your code thread safe. It _might_ save you from a small set of concurrency bugs, but they are fewer than you might think, and mostly it will just make intermittent existing issues that little bit more obvious when you move to real threads. Occasionally we would need to fix something in a core library or add a mutex, but those bugs could often be seen in a stress test with green threads or Ruby’s GVL.
My guess is the GIL or smaller mutexes will be needed for C extensions and a few other areas, but it’s also likely that could be moved to an opt in mechanism over time.
I dont get this. Just because you don't have the GIL doesnt mean that your previously single threaded code is now multithreaded and stepping on itself.
From previous discussions it's my understanding the c integration is going to be the cause for the issues.
From a python perspective it wouldn't necessarily be a big change, but everything can branch out to c, and there you're going to get in trouble with shared memory.
The most significant impact of this change is that python threads can run at the same time. Before, calls to C API were essentially cooperative multithreading yield points - even if you have 100 python threads, they can't run concurrently under the GIL. 99 are blocked waiting on the GIL.
C extensions have always been able to use real threading. But now there is no GIL to synchronize with the python interpreter on ingress/egress from the extension.
No, but it means that your previous multithreaded code is no longer automatically prevented from stepping all over itself by having multiple threads accessing the same data:
GIL removal doesn't mean "make all of this lockless" it means "replace the GIL with fine-grained locking". So those problems are still solved for Python code. The three issues are the amount of work it takes to do right, the performance cost for single-threaded code, and the CAPI.
I'm simultaneously scared and enlightened seeing all these comments acting as if the GIL is some magic "makes your code thread/concurrency safe" pancea. I always saw it as a shim/hack to make cpython specifically easier to implement, not something that inherently makes code more thread safe or atomic. It's just more work to do things "the right way" across application boundaries, but from my understanding this PEP is Meta commiting to do that work.
Removing the lock creates problems in existing code in practice. This is an ecosystem that has less focus on standards and more on "CPython is the reference implementation".
What non-transparent GIL specific behavior are developers relying on exactly?
When I say GIL specific behavior, I mean "python code that specifically requires a GLOBAL interpreter lock to function properly"
Not something that simply requires atomic access or any of the garuntees that the GIL has this far provided, but like, specifically code that requires GIL like behavior above any CPython implementation details that could be implemented with more fine grained concurrency assurances?
I've seen some really cursed python in my days, like checking against `locals()` to see if a variable was defined ala JavaScript's 'foo in window' syntax (but I suppose more portable), but I can't recall anything specifically caring about a global interpreter lock (instead of what the GIL has semantically provided, which is much different)
> What non-transparent GIL specific behavior are developers relying on exactly?
They are relying on behavior of a single environment. We similarly see a lot of issues moving threaded code to a new operating system in C/C++, because the underlying implementation details (cooperative threading, m:n hybrid threading, cpu-bound scheduling, I/O and signaling behavior) will hide bugs which came from mistakes and assumptions.
finally. you don't even have to read anything to work this out -- if things like dictionary access were no longer atomic that would imply that threaded code without locks could crash the interpreter, which isn't going to happen.
> GIL removal doesn't mean "make all of this lockless"
Literally speaking, that's exactly what "removal" means. As far as I can tell, GP was wondering why there's so much discussion about replacement, since simply removing the GIL wouldn't break single-threaded code.
It is solvable by making the hard decision to move to Python 4 with no backward compatibility. The two core issues imo in Python are the GIL and the environment hell and both simply can’t be solved while still keeping the 3 moniker. We’re in a field of constant workarounds and duct tape because we try pleasing too much
Python tried that (version 2 to 3) and both the community and dev team were traumatized by the effects enough they've publicly said it'll never happen again.
That means they didn't learn from it at all. The problem with Python 2 to python 3 is that it lost backwards compatibility because of very silly reasons like turning the print statement into a function. The vast majority of the problems could have been avoided by not making pointless changes with dubious benefits.
I seriously doubt anyone had problems fixing print as a statement. 2to3 fixed it...
I'll admit that, yes, changing string to bytes and unicode to string was a bit annoying, but the change itself wasn't fundamentally 'of dubious benefit', it did have benefits, and related to this, the only major issue was that you couldn't, for a long time, have code that worked in both where it came to literals. The biggest problem here was the implicit conversion from 2, that I agree needed to go.
Most of the other things can be trivially fixed automatically, or at least detected automatically, but without type hinting, it wasn't really easy to fix the automatic conversion.
There were other changes that were a bit tricky, but the majority of issues stemmed from the str/bytes change.
> perhaps there will be a flag to selectively operate a python process without the GIL (and leave the onous on you to completely test and validate your code works with potentially new and spooky behaviour).
Worked for ruby. The original interpreter MRI has a GIL too. Rubinius and JRuby added multi-threading with limited amounts of pain and people fixed libraries over the years. Sometimes just sprinkling lock blocks around a particular FFI calls or only doing them from a dedicated thread will do the job.
Getting rid of the GIL would warrant the release of Python 4.0 for me, except the Python project shouldn't be supporting two different branches for as long as they supported 2.7.
I imagine there would need to be some kind of annotation to enable the GIL for a method and all of the code it calls, including libraries, so performant Python can take advantage of the lack of a GIL but old code doesn't break. Then all you need to do to maintain compatibility is to annotate your main() and your code should remain compatible for a while.
After all, the referenced PEP explicitly calls for making the GIL optional, not for removing it completely.
>Python project shouldn't be supporting two different branches for as long as they supported 2.7.
That wasn't the problem. The problem was not giving people the ability to make their code python 3 compatible while they were still stuck with python 2. The python 3 interpreter should have had a python 2 mode that gives you warnings.
Then just mark the extensions that is compatible with GIL. And also you will have a switch that disables GIL, controllable with a environmental variable or launch option.
The GIL only protected you during any of those operations, so you can still switch threads waiting between LOAD_FAST and STORE_FAST and have a race.
There are a lot of things to be worried about with the GIL conversion of new race conditions that could happen, but there's already too much misinformation out there about the GIL, let's not spread this one even further.
> Fundamentally every single piece of python code ever written will have to stop and now worry about potential race conditions with innocent things like accessing a dictionary item or incrementing a value
Not really, just make those operations atomic or have automatic locking
It will be optional. You don't have to worry about it, set the option to ON and for those who can worry about it, they will have the option to set it to OFF.
The problem is, code that uses on a lot of 3rd party libraries that throws the ON-switch for nogil, will suddenly depend on all these libraries maintainers having worried about this.
We can set it up such that if a module imports modules that don't support nogil, nogil will be automatically disabled for them too.
So, library designers will be under pressure to update their libraries to enable support. We could also have code tools that detect patterns that aren't GIL safe and throw out loud warnings
Why should library authors be put under pressure because someone else chose the wrong tool for the job, and is now trying to push that externality on to the community?
Python is a single-threaded language. That’s part of its DNA. The community has already been through one traumatic transition in recent history and the appetite for another one is low.
Library authors should not update their libraries to support multithreading, rather the people who want that should be forced to rewrite their code in a language that is more suitable for the problem they want to solve.
I disagree that single threadedness is in its DNA. It's an implementation detail of CPython. There are other implementations which don't have a GIL even today.
Would removing the GIL be a big change for CPython? Yes. But IMO it's worth it
> Python is a single-threaded language. That’s part of its DNA.
If the changes proposed in this PEP go through, that will no longer be the case. So library authors pretty much will have to either update, or see their modules wither the same way they would if they weren't updated as new Python 3.x versions come along.
> rather the people who want that should be forced to rewrite their code in a language that is more suitable for the problem they want to solve
The vast majority of the Python developer community WANT Python to support true multithreading and being able to solve these problems. We expend inordinate amounts of time and skill trying to make Python work around the GIL, eg. by utilizing multiprocessing.
We want Python to stay relevant long-term. In an age of abundant multi-core platforms, and workloads that can utilize them, the GIL is a major obstacle to that desire.
This is wrong, that ship sailed 10+ years ago. Python is used for almost everything, better get used to it.
This kind of garbage logic lead to the horrible Ruby/Python/Javascript with C extensions split that makes their ecosystems very brittle, versus Java/C# where it is expected that things are fast enough without C and package management is much easier.
It's true that python is used for "almost everything", but it's only true because it plays nicely with C.
I understand the desire/demand for general purpose tools. The thing is, there are always tradeoffs. Acknowledging the tradeoffs and designing more specialized tools that work well together isn't necessarily garbage logic.
> Python is a single-threaded language. That’s part of its DNA.
Boooo. Maybe if all you do is ops scripts, but for those of us in data science this couldn't be further from the truth imo.
I'm not saying the syntax is easy for asyncio, or anywhere near as nice as golang or even kotlin is for concurrency, but it's definitely workable in a concurrent environment.
Yeah, I'm not a full-time developer but even for my bits of scripting I'd be wary of another big change in Python.
Personally I'd much rather see this effort go towards a new language which "feels" like Python but adopts more of the development experience of Go and Rust. From my tinkering it seems like Nim might already be that language, in which case what is needed is investment in its package ecosystem.
The problem with Python 2 to python 3 is that python 3 was essentially a new python 2 esque language instead of just being a major version bump.
If python 4 was no GIL python, then both Python and C would remain unchanged as a language.
I honestly don't think the community is ready for this change and don't expect to ever see stock cpython drop the GIL--perhaps there will be a flag to selectively operate a python process without the GIL (and leave the onous on you to completely test and validate your code works with potentially new and spooky behaviour).