How Can I Keep The Rows Of A Pandas Data Frame That Match A Particular Condition Using Value_counts() On Multiple Columns

June 19, 2023 Post a Comment

I would like to get rid of those rows where a particular value occurs only once in a column, considering 3 columns. That is, for feature: text: if value_counts() == 1, then elimi

Solution 1:

Still a little iffy if I'm understanding your problem correctly, but see if this different approach does what you need. I'm breaking it apart to make it understandable, but it could be done in an ugly one-liner as well.

counts_text = df_processed['text'].value_counts()
non_unique_text = df_processed['text'].apply(lambda text: counts_text[text]>1)

We're using the results of value_counts() as a dictionary of sorts here.

So now we have a series of booleans for each row, stating if the value in that row is non-unique. You can do the same to each of the other columns to make non_unique_nextword and non_unique_prevword, just by replacing all instances of text above with the corresponding column header.

Finally, we just use a logical AND to keep rows that have non-unique values in each of the three columns. Then we can get the final dataframe from the original by simple indexing:

df_nonunique = df_processed[non_unique_test & non_unique_nextword & non_unique_prevword]

Let me know if this is way off-base.

Python Guru

How Can I Keep The Rows Of A Pandas Data Frame That Match A Particular Condition Using Value_counts() On Multiple Columns

Solution 1:

Post a Comment for "How Can I Keep The Rows Of A Pandas Data Frame That Match A Particular Condition Using Value_counts() On Multiple Columns"