Selecting Data From Pandas Dataframe Based On Criteria Stored In A Dict
I have a Pandas dataframe that contains a large number of variables. This can be simplified as: tempDF = pd.DataFrame({ 'var1': [12,12,12,12,45,45,45,51,51,51],
Solution 1:
You can evaluate a series of conditions. They don't have to be just an equality.
df = tempDF
d = tempDict
# `repr` returns the string representation of an object.
>>> df[eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond))
for col, cond in d.iteritems()]))]
var1 var2 var3 var4
2 12 b f 3
3 12 b f 3
Looking at what eval
does here:
conditions = " & ".join(["(df['{0}'] == {1})".format(col, repr(cond))
for col, cond in d.iteritems()])
>>> conditions
"(df['var4'] == 3) & (df['var2'] == 'b')">>> eval(conditions)
0False1False2True3True4False5False6False7False8False9False
dtype: bool
Here is another example using an equality constraint:
>>> eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond))
for col, cond in d.iteritems()]))
d = {'var2': ('==', "'b'"),
'var4': ('>', 3)}
>>> df[eval(" & ".join(["(df['{0}'] {1} {2})".format(col, cond[0], cond[1])
for col, cond in d.iteritems()]))]
var1 var2 var3 var4
445 b f 4545 b g 5645 b g 6
Another alternative is to use query
:
qry = " & ".join('{0} {1} {2}'.format(k, cond[0], cond[1]) for k, cond in d.iteritems())
>>> qry
"var4 > 3 & var2 == 'b'">>> df.query(qry)
var1 var2 var3 var4
445 b f 4545 b g 5645 b g 6
Solution 2:
You could create mask for each condition using list comprehension and then join them by converting to dataframe and using all
:
In [23]: pd.DataFrame([tempDF[key] == val forkey, val in tempDict.items()]).T.all(axis=1)
Out[23]:
0False1False2True3True4False5False6False7False8False9Falsedtype: bool
Then you could slice your dataframe with that mask:
mask = pd.DataFrame([tempDF[key] == valfor key, valin tempDict.items()]).T.all(axis=1)
In [25]: tempDF[mask]
Out[25]:
var1 var2 var3 var4
212 b f 3312 b f 3
Solution 3:
Here's one way to build up conditions from tempDict
In [25]: tempDF.loc[pd.np.all([tempDF[k] == tempDict[k] for k in tempDict], axis=0), :]
Out[25]:
var1 var2 var3 var4
212 b f 3312 b f 3
Or use query
for more readable query-like string.
In [33]: tempDF.query(' & '.join(['{0}=={1}'.format(k, repr(v)) for k, v in tempDict.iteritems()]))
Out[33]:
var1 var2 var3 var4
212 b f 3312 b f 3
In [34]: ' & '.join(['{0}=={1}'.format(k, repr(v)) for k, v in tempDict.iteritems()])
Out[34]: "var4==3 & var2=='b'"
Post a Comment for "Selecting Data From Pandas Dataframe Based On Criteria Stored In A Dict"