Skip to content Skip to sidebar Skip to footer

Multiprocessing/threading: Data Appending & Output Return

I have a lengthy function called run below that contains a few instances of appending data. from multiprocessing import Process data = [] def run(): global data ... d

Solution 1:

Multiprocessing spawns a different Process with it's own global variables copies from current environment. All the changes in variable made in that process does not reflect in parent process. You need to share memory between the process and variables in shared memory can be exchanged.

You can use multiprocessing.Manager to create a shared object like list or dictionary, and manipulate that object.

Processes are assigned to different cores/thread of your processor. If you have a 4 core/8 thread system, spawn a maximum of 7 processes to maximize performance, any more than that some processes will interfere with other processes and can slow down/reduce the cpu time allotted to your os which can crash your system. It's always the cpu cores/cpu threads -1 processes for stable processing leaving atleast one core to os to handle other operations.

You can modify your code like this

from multiprocessing import Process, Manager
import time

defrun(list_):
    list_.append(trace)

if __name__ == "__main__":
    jobs = []
    gen_count = 0
    leaked_count = 0
    system_count = 0with Manager() as manager:
        list_ = manager.list()
        for _ inrange(multiprocessing.cpu_count()-1):
            p = Process(target=run,args=(list_))
            jobs.append(p)
            p.start()
        whileTrue: #stops main thread from completing execution
            time.sleep(5) #wait 5 second before checking if processes are terminatedifall([not x.is_alive() for x in jobs]): #check if all processes terminatedbreak#breaks the loop 

Solution 2:

The way multiprocessing works, each subtask runs in its own memory space and gets its own copy of any global variables. A common way around this limitation to effectively have shared data is to use a multiprocessing.Manager to coordinate concurrent access to it and transparently prevent any problems that might cause.

Below is an example of doing that based on your sample code. It also uses a multiprocessing.Pool() which makes it easy to create a fixed-size collection of process objects that can each provide asynchronous results from each subtask (or wait until all of them are finished before retrieving them, as is being done here).

from functools import partial
import multiprocessing

defrun(data, i):
    data.append('trace%d' % i)
    return1, 2, 3# values to add to gen_count, leaked_count, and system_countif __name__ == '__main__':
    N = 10
    manager = multiprocessing.Manager()  # create SyncManager
    data = manager.list()  # create a shared list
    pool = multiprocessing.Pool()

    async_result = pool.map_async(partial(run, data), range(N))
    values = tuple(zip(*async_result.get()))
    gen_count = sum(values[0])
    leaked_count = sum(values[1])
    system_count = sum(values[2])

    print(data)
    print('Totals:  gen_count {}, leaked_count {}, system_count {}'.format(
            gen_count, leaked_count, system_count))

Output:

['trace0', 'trace1', 'trace2', 'trace4', 'trace3', 'trace5', 'trace8', 'trace6', 'trace7', 'trace9']
Totals:  gen_count 10, leaked_count 20, system_count 30

Post a Comment for "Multiprocessing/threading: Data Appending & Output Return"