Threading vs Multiprocessing in Python
This is bit confusing topic if you know thread & mprocessing in C/C++/Java so I hope it will clear for you after reading this.
Multiple Python Implementations ?
Python have different implementations over the years. Most commonly used one is CPython which is implemented by Python core developers and the Python community, supported by the Python Software Foundation. Which is effectively version you would download from www.python.org. CPython is implemented with C obviously.
So what is CPython?
https://stackoverflow.com/questions/17130975/python-vs-cpython
@Chiel from Stackoverflow
CPython is the original Python implementation. It is the implementation you download from Python.org. People call it CPython to distinguish it from other, later, Python implementations, and to distinguish the implementation of the language engine from the Python programming language itself.
The latter part is where your confusion comes from; you need to keep Python-the-language separate from whatever runs the Python code.
CPython happens to be implemented in C. That is just an implementation detail, really. CPython compiles your Python code into bytecode (transparently) and interprets that bytecode in a evaluation loop.
CPython is also the first to implement new features; Python-the-language development uses CPython as the base; other implementations follow.
And there are other Python implementations like Jython (Java) , C# Python and R Python implementations are also used in some areas in Python development community.Notice that So CPython does not translate your Python code to C by itself. Instead, it runs an interpreter loop. If you are interested in that please look for Cython which is entirely different beast.
In CPython there is a special mechanism implemented and its called GIL.
What is GIL aka Global Interpreter Lock ?
In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.).
It’s a developer/implementation decision which is not to mess and deal memory management. Since this is a really big part of all legacy Python code it’s utterly impossible to change it any moment (hence idea of other implementations). What GIL in nutshell means your so called multihthread operations in Python are actually not multithreadded ? Outside of some blocking and long running applications like (I/O,Image Processing,Numpy these happen outside of GIL) most of your code interpreted will become bottleneck with GIL. Eliminating GIL is one of the most wanted features for Python mail groups but developers has yet to find a solution about it. More on that https://wiki.python.org/moin/GlobalInterpreterLock
So what’s point of threads in Python then?
Threading albeit not effectively like it’s actual purpose and implementation in other languages are still useful tool for developers if one has right case to use it. Network scripts, I/O, data related tasks still depends on waiting for data and you can still do other computations in CPU while they are doing their job without creating new process for them(which saves memory and time = performance for you). For CPU intensive processes, there is little benefit to using the threading module.
When to use Multi Processing and when to use Threads
A new thread is spawned from existing process , a process is independent from the process it created. So it makes faster to create threads ccompared to multiprocessing. You will get all memory is shared with threads which might be useful and efficient thing if your tasks demand it. You will need mutex with threads for control of shared memory stack , not needed for multiprocesses they have their own memory spaces. One GIL for all threads (threading library) , One GIL for each process (multiprocessing)
These code bit from engineer-man , will link them in end of page.
# threading example
from threading import Thread
import os
import math
def calc():
for i in range(0, 4000000):
math.sqrt(i)
threads = []
for i in range(os.cpu_count()):
print('registering thread %d' % i)
threads.append(Thread(target=calc))
for thread in threads:
thread.start()
for thread in threads:
thread.join()
What it looks like when executed
# multi processing example
from multiprocessing import Process
import os
import math
def calc():
for i in range(0, 70000000):
math.sqrt(i)
processes = []
for i in range(os.cpu_count()):
print('registering process %d' % i)
processes.append(Process(target=calc))
for process in processes:
process.start()
for process in processes:
process.join()
What it looks like when executed
So a if you are going to use I/O , data pulling and network scripts definitely use threads with python if you are gonna do all your processing within CPU then i suggest to use multiprocessing for your program.
I hope it cleared out some smoke and will change your view on how to use them with Python compared to other languages.
https://github.com/engineer-man/youtube/tree/master/036 python code for example.
Thanks for reading and happy coding all !