How To Check If All The Elements In List Are Present In Pandas Column
I have a dataframe and a list: df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8], 'char':[['a','b'],['a','b','c'],['a','c'],['b','c'],[],['c','a','d'],['c','d'],['a']]}) names = ['a'
Solution 1:
Use pd.DataFrame.apply
:
df[df['char'].apply(lambda x: set(names).issubset(x))]
Output:
id char
12[a, b, c]23[a, c]56[c, a, d]
Solution 2:
You can build a set from the list of names for a faster lookup, and use set.issubset
to check if all elements in the set are contained in the column lists:
names = set(['a','c'])
df[df['char'].map(names.issubset)]
id char
1 2 [a, b, c]
2 3 [a, c]
5 6 [c, a, d]
Solution 3:
Use list comprehension with issubset
:
mask = [set(names).issubset(x) for x indf['char']]
df = df[mask]
print (df)
id char
1 2 [a, b, c]
2 3 [a, c]
5 6 [c, a, d]
Another solution with Series.map
:
df = df[df['char'].map(set(names).issubset)]
print (df)
id char
1 2 [a, b, c]
2 3 [a, c]
5 6 [c, a, d]
Performance Depends of number of rows and number of matched values:
df = pd.concat([df] *10000, ignore_index=True)
In [270]: %timeit df[df['char'].apply(lambda x: set(names).issubset(x))]
45.9 ms ± 2.26 ms per loop (mean ± std. dev. of7 runs, 10 loops each)
In [271]: %%timeit
...: names =set(['a','c'])
...: [names.issubset(set(row)) for _,rowin df.char.iteritems()]
...:
46.7 ms ± 5.51 ms per loop (mean ± std. dev. of7 runs, 10 loops each)
In [272]: %%timeit
...: df[[set(names).issubset(x) for x in df['char']]]
...:
45.6 ms ± 1.26 ms per loop (mean ± std. dev. of7 runs, 10 loops each)
In [273]: %%timeit
...: df[df['char'].map(set(names).issubset)]
...:
18.3 ms ± 2.96 ms per loop (mean ± std. dev. of7 runs, 100 loops each)
In [274]: %%timeit
...: n =set(names)
...: df[df['char'].map(n.issubset)]
...:
16.6 ms ± 278 µs per loop (mean ± std. dev. of7 runs, 100 loops each)
In [279]: %%timeit
...: names =set(['a','c'])
...: m = [name.issubset(i) for i in df.char.values.tolist()]
...:
19.2 ms ± 317 µs per loop (mean ± std. dev. of7 runs, 100 loops each)
Solution 4:
Try this.
df['char']=df['char'].apply(lambda x: x if ("a"in x and "c"in x) else np.nan)
print(df.dropna())
output:
id char
12[a, b, c]23[a, c]56[c, a, d]
Post a Comment for "How To Check If All The Elements In List Are Present In Pandas Column"