Skip to content Skip to sidebar Skip to footer

How Can I Keep The Rows Of A Pandas Data Frame That Match A Particular Condition Using Value_counts() On Multiple Columns

I would like to get rid of those rows where a particular value occurs only once in a column, considering 3 columns. That is, for feature: text: if value_counts() == 1, then elimi

Solution 1:

Still a little iffy if I'm understanding your problem correctly, but see if this different approach does what you need. I'm breaking it apart to make it understandable, but it could be done in an ugly one-liner as well.

counts_text = df_processed['text'].value_counts()
non_unique_text = df_processed['text'].apply(lambda text: counts_text[text]>1)

We're using the results of value_counts() as a dictionary of sorts here.

So now we have a series of booleans for each row, stating if the value in that row is non-unique. You can do the same to each of the other columns to make non_unique_nextword and non_unique_prevword, just by replacing all instances of text above with the corresponding column header.

Finally, we just use a logical AND to keep rows that have non-unique values in each of the three columns. Then we can get the final dataframe from the original by simple indexing:

df_nonunique = df_processed[non_unique_test & non_unique_nextword & non_unique_prevword]

Let me know if this is way off-base.

Post a Comment for "How Can I Keep The Rows Of A Pandas Data Frame That Match A Particular Condition Using Value_counts() On Multiple Columns"