Skip to content Skip to sidebar Skip to footer

Most Pythonic Way To Remove Tuples From A List If First Element Is A Duplicate

The code I have so far is pretty ugly: orig = [(1,2),(1,3),(2,3),(3,3)] previous_elem = [] unique_tuples = [] for tuple in orig: if tuple[0] not in previous_elem: u

Solution 1:

If you don't care which tuple round you return for duplicates, you could always convert your list to a dictionary and back:

>>>orig = [(1,2),(1,3),(2,3),(3,3)]>>>list(dict(orig).items())
[(1, 3), (2, 3), (3, 3)]

If you want to return the first tuple round, you could reverse your list twice and use an OrderedDict, like this:

>>>from collections import OrderedDict>>>orig = [(1,2),(1,3),(2,3),(3,3)]>>>new = list(OrderedDict(orig[::-1]).items())[::-1]
[(1, 2), (2, 3), (3, 3)]

These are not the most efficient solutions (if that's of great importance) , but they do make for nice idiomatic one-liners.


Some benchmarking

Note the difference in speed, and that should you not care which tuple round you return, the first option is much more efficient:

>>> import timeit
>>> setup = '''
orig = [(1,2),(1,3),(2,3),(3,3)]
'''>>> print (min(timeit.Timer('(list(dict(orig).items()))', setup=setup).repeat(7, 1000)))
0.0015771419037069459

compared to

>>>setup = '''
orig = [(1,2),(1,3),(2,3),(3,3)]
from collections import OrderedDict
'''>>> print (min(timeit.Timer('(list(OrderedDict(orig[::-1]).items())[::-1])', 
             setup=setup).repeat(7, 1000)))
0.024554947372323

The first option is nearly 15 times faster according to these speed tests.

That being said however, Saksham's answer is also O(n) and smashes these dictionary methods efficiency wise:

>>> setup = '''
orig = [(1,2),(1,3),(2,3),(3,3)]
newlist = []
seen = set()
def fun():
    for (a, b) in orig:
        if not a in seen:
            newlist.append((a, b))
            seen.add(a)
    return newlist
'''>>> print (min(timeit.Timer('fun()', setup=setup).repeat(7, 1000)))
0.0004833390384996095

Solution 2:

If you want the first containing a specific key to always be the one that appears in the final list:

list(reversed(collections.OrderedDict( reversed([(1,2),(1,3),(2,3),(3,3)])).items()))

Which results in:

[(1, 2), (2, 3), (3, 3)]

Solution 3:

If you do not wish to store in an additional data structure, the time complexity O(n^2), as pointed out in the comments:

orig = [(1,2),(1,3),(2,3),(3,3)]
newlist = []
for (a, b) in orig:
    if not any(x == a for x, y in newlist):
        newlist.append((a, b))
print newlist    # prints [(1, 2), (2, 3), (3, 3)]

A little book-keeping can reduce it to linear time:

orig = [(1,2),(1,3),(2,3),(3,3)]
newlist = []
seen = set()
for (a, b) in orig:
    if not a in seen:
        newlist.append((a, b))
        seen.add(a)
print newlist    # prints [(1, 2), (2, 3), (3, 3)]

Post a Comment for "Most Pythonic Way To Remove Tuples From A List If First Element Is A Duplicate"