Calculating Precision, Recall And F-score In One Pass - Python

February 26, 2024 Post a Comment

Accuracy, precision, recall and f-score are measures of a system quality in machine-learning systems. It depends on a confusion matrix of True/False Positives/Negatives. Given a bi

Solution 1:

what is the pythonic way to get the counts of the True/False Positives/Negatives without multiple loops through the dataset?

I would use a collections.Counter, roughly what you're doing with all of the ifs (you should be using elifs, as your conditions are mutually exclusive) at the end:

counts = Counter(zip(predicted, gold))

Then e.g. true_pos = counts[1, 1].

How do I pythonically catch the ZeroDivisionError without the multiple try-excepts?

For a start, you should (almost) never use a bare except:. If you're catching ZeroDivisionErrors, then write except ZeroDivisionError. You could also consider a "look before you leap" approach, checking whether the denominator is 0 before trying the division, e.g.

accuracy = (true_pos + true_neg) / float(len(gold)) if gold else 0

Solution 2:

This is a pretty natural use case for the bitarray package.

import bitarray as bt

tp = (bt.bitarray(p) & bt.bitarray(g)).count()
tn = (~bt.bitarray(p) & ~bt.bitarray(g)).count()
fp = (bt.bitarray(p) & ~bt.bitarray(g)).count()
fn = (~bt.bitarray(p) & bt.bitarray(g)).count()

There's some type conversion overhead, but after that, the bitwise operations are much faster.

For 100 instances, timeit on my PC gives 0.036 for your method and 0.017 using bitarray at 1000 passes. For 1000 instances, it goes to 0.291 and 0.093. For 10000, 3.177 and 0.863. You get the idea.

It scales pretty well, using no loops, and doesn't have to store a large intermediate representation building a temporary list of tuples in zip.

Solution 3:

Depending on your needs, there are several libraries that will calculate precision, recall, F-score, etc. One that I have used is scikit-learn. Assuming that you have aligned lists of actual and predicted values, then it is as simple as...

from sklearn.metricsimport precision_recall_fscore_support as pr
bPrecis, bRecall, bFscore, bSupport = pr(gold, predicted, average='binary')

One of the advantages of using this library is that different flavors of metrics (such as micro-averaging, macro-averaging, weighted, binary, etc.) come free out of the box.

Python Guru

Calculating Precision, Recall And F-score In One Pass - Python

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Calculating Precision, Recall And F-score In One Pass - Python"