Small change, big difference
Sometimes the small things in life can make the biggest difference. This time it is a small adventure in Python.
I was just working on a small PoC (Proof of Concept) to do some nginx
testing. I wanted to have a working nginx
, two simple Python APIs and then a client that would send requests to the nginx
instance that would be load balancing to either of those Python API instances. For some reason it would not collect the statistics correctly. Hereunder a simplified example but it will contain the core concept of what I was trying to do.
Concurrency
So in Python it is very simple to do concurrency. You get an executor and then it will just be available. In code:
from concurrent.futures import ProcessPoolExecutor
var = 0
def func(var):
var = var + 1
print(var)
with ProcessPoolExecutor() as pool:
for _ in range(10):
pool.submit(func, var)
print(var)
If you run it you will see something weird. The var will be constantly 1
and the last one will be 0
. What is going on here?
Processes
The key indicator is the ProcessPoolExecutor
it will create a separate instance/interpreter/process and run the code inside there. So that var
will be complete different instance as well. It took some time for me to realize this. So how to fix this? Either switch to ThreadPoolExecutor
, or do the following:
from concurrent.futures import ProcessPoolExecutor, wait
from multiprocessing.managers import SyncManager
var = 0
def update_var():
global var
var = var + 1
manager = SyncManager(address=('', 5566), authkey=b"secret")
manager.register('update_var', callable=update_var)
manager.register('get_var', callable=lambda: var)
manager.start()
def func():
m = SyncManager(address=('', 5566), authkey=b"secret")
m.register("update_var")
m.connect()
m.update_var()
m.shutdown()
futs = []
with ProcessPoolExecutor() as pool:
for _ in range(10):
futs.append(pool.submit(func))
wait(futs)
print(manager.get_var())
manager.shutdown()
That is quite the transformation. So in essence what we need extra is a Manager
to synchronize between processes. One for updating the var
and one to get the var
.
Then every function needs to connect to the Manager
and then register everything double.
If one wants to get rid of the global var
then you could make a simple class that holds state and instantiate one to the manager
as well.
Conclusion
Sometimes something simple turns out to be quite complicated. A bonus solution would be to use shared_memory
. That could be something for another post.