Skip to content Skip to sidebar Skip to footer

Multiprocessing A For Loop In Python

Multiprocessing a for loop in Python I have a program that currently takes a very long time to run since it processes a large number of files. I was hoping to be able to run the pr

Solution 1:

I suppose the quickest / simplest way to get there is to use a multiprocessing pool and let it run across iterable (of your files)... A minimal example with fixed number of workers and a little extra info to observe behavior would be:

import datetime
import time

from multiprocessing import Pool

deflong_running_task(filename):
    time.sleep(1)
    print(f"{datetime.datetime.now()} finished: {filename}")

filenames = range(15)

with Pool(10) as mp_pool:
    mp_pool.map(long_running_task, filenames)

This creates a pool of 10 workers and will call long_running_task with each item from filenames (here just series of 0..14 ints as a stand-in) as a task finishes and the worker becomes available.

Alternatively, if you wanted to iterate over the inputs yourself, you could do something like:

with Pool(10) as mp_pool:
    forfninrange(15):
        mp_pool.apply_async(long_running_task, (fn,))
    mp_pool.close()
    mp_pool.join()

This would pass fn as first positional argument for each long_running_task call... when assigning all the work, we need to close the pool to stop accepting any more requests and join to wait for any outstanding jobs to finish.

Post a Comment for "Multiprocessing A For Loop In Python"