Working Out Dependent Groups

February 02, 2023 Post a Comment

Using the following function I can generate some test data. import random, string a = list(string.ascii_lowercase) def gen_test_data(): s = [] for i in xrange(15):

Solution 1:

This is a classic connected components problem. There are a number of efficient linear-time or nearly-linear-time algorithms to solve it, such as with a graph search algorithm like depth-first search or with a union-find data structure.

For a search-based algorithm, you would set up a graph based on your input, with nodes in the input sublists connected, then run a search of the graph to find which nodes are reachable from each other. Graph libraries like NetworkX can handle most of the implementation for you, or you could write it yourself. For example,

import collections

def build_graph(data):
    graph = collections.defaultdict(list)
    for sublist in data:
        # It's enough to connect each sublist element to the first element.
        # No need to connect each sublist element to each other sublist element.
        for item in sublist[1:]:
            graph[sublist[0]].append(item)
            graph[item].append(sublist[0])
        if len(sublist) == 1:
            # Make sure we add nodes even if they don't have edges.
            graph.setdefault(sublist[0], [])
    return graph

def dfs_connected_component(graph, node):
    frontier = [node]
    visited = set()
    while frontier:
        frontier_node = frontier.pop()
        if frontier_node in visited:
            continue
        visited.add(frontier_node)
        frontier.extend(graph[frontier_node])
    return visited

def dependent_groups(data):
    graph = build_graph(data)
    components = []
    nodes_with_component_known = set()
    for node in graph:
        if node in nodes_with_component_known:
            continue
        component = dfs_connected_component(graph, node)
        components.append(component)
        nodes_with_component_known.update(component)
    return components

This would have runtime linear in the size of the input.

You could also use a union-find data structure. A union-find data structure associates items with sets, each set represented by a representative element. They support two operations: find, which finds the representative for an element, and union, which takes two elements and joins their sets into one set.

You can set up a union-find data structure, and for each sublist in your input, union each element of the sublist with the first element of the sublist. This will efficiently group all dependent elements without joining any independent elements together.

With the standard implementation of a union-find data structure as a disjoint-set forest with union by rank and path compression, the runtime would be essentially linear in the size of the input. It'd be slower by a factor of the inverse Ackermann function of the input, which is essentially constant for all practical input sizes.

Solution 2:

These are the connected components of a graph and you can use networkx to generate them:

>>> import networkx as nx
>>> graph = nx.Graph()
>>> for path in data:
...     nx.add_path(graph, path)
...
>>> [list(component) for component in nx.connected_components(graph)]
[
    ['h','q','c','d','a','g','p','f','s','w','i','v','u','x','z','y','k','l'],
    ['b'],
    ['t']
]

Solution 3:

Here is one way to do it (using a recursive algorithm):

lst = [
    ['a', 's'],
    ['f', 'c'],
    ['w', 'z'],
    ['z', 'p'],
    ['z', 'u', 'g'],
    ['v', 'q', 'w'],
    ['y', 'w'],
    ['d', 'x', 'i'],
    ['l', 'f', 's'],
    ['z', 'g'],
    ['h', 'x', 'k'],
    ['b'],
    ['t'],
    ['s', 'd', 'c'],
    ['s', 'w', 'd']
]

def find_deps(letter, lst, already_done=set()):
    if letter in already_done:
        return already_done
    already_done = already_done.union(set([letter]))
    to_do = set()

    ## First scan the list to see what's there to process
    for sublist in lst:
        if letter in sublist:
            newset = set(x for x in sublist if x != letter)
            to_do = to_do.union(newset)

    ## Process first-dependents
    out = to_do.copy()
    for lll in to_do:
        out = out.union(find_deps(lll, lst, already_done=already_done))
    return out.union(set([letter]))

print find_deps('a', lst)
# set(['a', 'c', 'd', 'g', 'f', 'i', 'h', 'k', 'l', 'q', 'p', 's', 'u', 'w', 'v', 'y', 'x', 'z'])

print find_deps('b', lst)
# set(['b'])

print find_deps('t', lst)
# set(['t'])

Python Guru

Working Out Dependent Groups

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Working Out Dependent Groups"