Python, Convert A Dictionary To A Sorted List By Value Instead Of Key

December 22, 2023 Post a Comment

I have a collections.defaultdict(int) that I'm building to keep count of how many times a key shows up in a set of data. I later want to be able to sort it (obviously by turning it

Solution 1:

A dict's keys, reverse-sorted by the corresponding values, can best be gotten as

sorted(adict, key=adict.get, reverse=True)

since you want key/value pairs, you could work on the items as all other answers suggest, or (to use the nifty adict.get bound method instead of itemgetters or weird lambdas;-),

[(k, adict[k]) for k in sorted(adict, key=adict.get, reverse=True)]

Edit: in terms of performance, there isn't much into it either way:

$ python -mtimeit -s'adict=dict((x,x**2) forxinrange(-5,6))' '[(k, adict[k]) forkinsorted(adict, key=adict.get, reverse=True)]'
100000 loops, best of 3: 10.8 usec per loop
$ python -mtimeit -s'adict=dict((x,x**2) forxinrange(-5,6)); from operator import itemgetter' 'sorted(adict.iteritems(), key=itemgetter(1), reverse=True)'
100000 loops, best of 3: 9.66 usec per loop
$ python -mtimeit -s'adict=dict((x,x**2) forxinrange(-5,6))' 'sorted(adict.iteritems(), key=lambda (k,v): v, reverse=True)'
100000 loops, best of 3: 11.5 usec per loop

So, the .get-based solution is smack midway in performance between the two items-based ones -- slightly slower than the itemgetter, slightly faster than the lambda. In "bottleneck" cases, where those microsecond fractions are crucial to you, by all means do focus on that. In normal cases, where this operation is only one step within some bigger task and a microsecond more or less matters little, focusing on the simplicity of the get idiom is, however, also a reasonable alternative.

Solution 2:

To get the dictionary sorted:

from operator import itemgetter

sorted(adict.iteritems(), key=itemgetter(1), reverse=True)

Solution 3:

Just sort the resulting dict by values:

for k, v in sorted(adict.items(), key=lambda kv: kv[1], reverse=True):
    print("%s => %s" % (k,v))

Solution 4:

Note: I'm putting this in as an answer so that it gets seen. I don't want upvotes. If you want to upvote anyone, upvote Nadia.

The currently accepted answer gives timing results which are based on a trivially small dataset (size == 6 - (-5) == 11). The differences in cost of the various methods are masked by the overhead. A use case like what are the most frequent words in a text or most frequent names in a membership list or census involves much larger datasets.

Repeating the experiment with range(-n,n+1) (Windows box, Python 2.6.4, all times in microseconds):

n=5: 11.5, 9.34, 11.3 n=50: 65.5, 46.2, 68.1 n=500: 612, 423, 614

These results are NOT "slightly" different.The itemgetter answer is a clear winner on speed.

There was also mention of "the simplicity of the get idiom". Putting them close together for ease of comparison:

[(k, adict[k]) for k in sorted(adict, key=adict.get, reverse=True)]sorted(adict.iteritems(), key=itemgetter(1), reverse=True)

The get idiom not only looks up the dict twice (as J. F. Sebastian has pointed out), it makes one list (result of sorted()) then iterates over that list to create a result list. I'd call that baroque, not simple. YMMV.

Solution 5:

If you're using the newest python 2.7 alpha, then you can use the Counter class in collections module:

c = Counter()

c['someval'] += 1
c['anotherval'] += 1
c['someval'] += 1

print c.most_common()

prints in the correct order:

[('someval', 2), ('anotherval', 1)]

The code used on 2.7 is available already and there's a version adapted to 2.5. Perhaps you want to use it to stay forward compatible with the native stdlib version that is about to be released.

Python Guru