Skip to content Skip to sidebar Skip to footer

Do I Have To Deviate From Pep 8 Style Conventions When Comparing To Booleans In Pandas?

I used to the following when altering a dataframe column based on a condition (in this case, every woman gets a wage of 200). import pandas as pd df = pd.DataFrame([[False,100],[Tr

Solution 1:

You should use df['female'] with no comparison, rather than comparing to True with any operator. df['female'] is already the mask you need.

Comparison to True with == is almost always a bad idea, even in NumPy or Pandas.

Solution 2:

Just do

df.loc[df['female'], 'wage'] = 200 

In fact df['female'] as a Boolean series has exactly the same values as the Boolean series returned by evaluating df['female'] == True, which is also a Boolean series. (A Series is the Pandas term like a single column in a dataframe).

By the way, the last statement is precisely why df['female'] is True should never work. In Python, the is operator is reserved for object identity, not for comparing values for equality. df['female'] will always be a Series (if df is a Pandas dataframe) and a Series will never be the same (object) as the single

To understand this better think of the difference, in English, between 'equal' and 'same'. In German, this is the difference between 'selbe' (identity) and 'gleiche' (equality). In other languages, this distinction is not as explicit.

Thus, in Python, you can compare a (reference to an) object to (the special object) None with : if obj is None : ... or even check that two variables ('names' in Python terminology) point to the exact same object with if a is b. But this condition holding is a much stronger assertion than just comparing for equality a == b. In fact the result of evaluating the expression a == b might be anything, not just a single Boolean value. It all depends on what class a belongs to, that is, what its type is. In your context a == b actually yields a boolean Series, provided both a and b are also a Pandas Series.

By the way if you want to check that all values agree between two Series a and b then you should evaluate (a == b).all() which reduces the whole series to a single Boolean value, which will be True if and only if a[i] == b[i] for every value of i.

Post a Comment for "Do I Have To Deviate From Pep 8 Style Conventions When Comparing To Booleans In Pandas?"