Skip to content Skip to sidebar Skip to footer

How Do I Make My Implementation Of Greedy Set Cover Faster?

I came up with the following implementation for the Greedy Set Cover after much discussion regarding my original question here. From the help I received, I encoded the problem into

Solution 1:

I use a trick when I implemented the famous greedy algorithm for set cover (no weights) in Matlab. It is possible that you could extend this trick to the weighted case somehow, using set cardinality / set weight instead of set cardinality. Moreover, if you use NumPy library, exporting Matlab code to Python should be very easy.

Here is the trick:

  1. (optional) I sorted the sets in descending order with respect to the cardinality (i.e. number of elements they contain). I also stored their cardinalities.
  2. I select a set S, in my implementation it is the largest (i.e. first set of the list), and I count how many uncovered elements it contains. Let's say that it contains n uncovered elements.
  3. Since now I know there is a set S with n uncovered elements, I don't need to process all the sets with cardinality lower than n elements, because they cannot be better than S. So I just need to search for the optimal set among the sets with cardinality at least n; with my sorting, we can focus on them easily.

Solution 2:

What sorts of times are you getting vs what you need? Surely most of the execution time is spent in c-level code finding set intersections, so there's not much optimization you can do? With some random data (results may vary with your data of course, not sure if these are good values) of 100000 sets, 40 elements in each set, 500 unique elements, weights random from 1 to 10,

print 'generating test data'    
num_sets = 100000
set_size = 40
elements = range(500)
U = set(elements)
R = U
S = []
for i in range(num_sets):
    random.shuffle(elements)
    S.append(set(elements[:set_size]))
w = [random.randint(1,100) for i in xrange(100)]

C = []
costs = []

I got performance like this with cProfile:

8200209functioncallsin 14.391 CPUsecondsOrderedby: standardnamencallstottimepercallcumtimepercallfilename:lineno(function)10.0000.00014.39114.391 <string>:1(<module>)
       414.8020.11714.3890.351 test.py:23(findMin)
        10.0010.00114.39114.391 test.py:40(func)
  41000420.4280.0000.4280.000 {len}
       820.0000.0000.0000.000 {method 'append' of 'list' objects}
       410.0010.0000.0010.000 {method 'difference' of 'set' objects}
        10.0000.0000.0000.000 {method 'disable' of '_lsprof.Profiler' objects}
  41000009.1600.0009.1600.000 {method 'intersection' of 'set' objects}

Hm, so mostly apparently 1/3 of the time isn't in set intersections. But I personally wouldn't optimize any more, especially at the cost of clarity. There's not going to be much you can do with the other 2/3, so why bother?

Post a Comment for "How Do I Make My Implementation Of Greedy Set Cover Faster?"