Small change, big difference

Sometimes the small things in life can make the biggest difference. This time it is a small adventure in Python.

I was just working on a small PoC (Proof of Concept) to do some nginx testing. I wanted to have a working nginx, two simple Python APIs and then a client that would send requests to the nginx instance that would be load balancing to either of those Python API instances. For some reason it would not collect the statistics correctly. Hereunder a simplified example but it will contain the core concept of what I was trying to do.

Concurrency

So in Python it is very simple to do concurrency. You get an executor and then it will just be available. In code:

from concurrent.futures import ProcessPoolExecutor

var = 0


def func(var):
    var = var + 1
    print(var)


with ProcessPoolExecutor() as pool:
    for _ in range(10):
        pool.submit(func, var)

print(var)

If you run it you will see something weird. The var will be constantly 1 and the last one will be 0. What is going on here?

Processes

The key indicator is the ProcessPoolExecutor it will create a separate instance/interpreter/process and run the code inside there. So that var will be complete different instance as well. It took some time for me to realize this. So how to fix this? Either switch to ThreadPoolExecutor, or do the following:

from concurrent.futures import ProcessPoolExecutor, wait
from multiprocessing.managers import SyncManager

var = 0


def update_var():
    global var
    var = var + 1


manager = SyncManager(address=('', 5566), authkey=b"secret")
manager.register('update_var', callable=update_var)
manager.register('get_var', callable=lambda: var)
manager.start()


def func():
    m = SyncManager(address=('', 5566), authkey=b"secret")
    m.register("update_var")
    m.connect()
    m.update_var()
    m.shutdown()


futs = []
with ProcessPoolExecutor() as pool:
    for _ in range(10):
        futs.append(pool.submit(func))
wait(futs)
print(manager.get_var())
manager.shutdown()

That is quite the transformation. So in essence what we need extra is a Manager to synchronize between processes. One for updating the var and one to get the var.

Then every function needs to connect to the Manager and then register everything double.

If one wants to get rid of the global var then you could make a simple class that holds state and instantiate one to the manager as well.

Conclusion

Sometimes something simple turns out to be quite complicated. A bonus solution would be to use shared_memory. That could be something for another post.

#devlife #python