Skip to content Skip to sidebar Skip to footer

Selecting Data From Pandas Dataframe Based On Criteria Stored In A Dict

I have a Pandas dataframe that contains a large number of variables. This can be simplified as: tempDF = pd.DataFrame({ 'var1': [12,12,12,12,45,45,45,51,51,51],

Solution 1:

You can evaluate a series of conditions. They don't have to be just an equality.

df = tempDF
d = tempDict

# `repr` returns the string representation of an object.    
>>> df[eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond)) 
       for col, cond in d.iteritems()]))]
   var1 var2 var3  var4
2    12    b    f     3
3    12    b    f     3

Looking at what eval does here:

conditions = " & ".join(["(df['{0}'] == {1})".format(col, repr(cond)) 
       for col, cond in d.iteritems()])

>>> conditions
"(df['var4'] == 3) & (df['var2'] == 'b')">>> eval(conditions)
0False1False2True3True4False5False6False7False8False9False
dtype: bool

Here is another example using an equality constraint:

>>> eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond)) 
                      for col, cond in d.iteritems()]))
d = {'var2': ('==', "'b'"),
     'var4': ('>', 3)}

>>> df[eval(" & ".join(["(df['{0}'] {1} {2})".format(col, cond[0], cond[1]) 
       for col, cond in d.iteritems()]))]
   var1 var2 var3  var4
445    b    f     4545    b    g     5645    b    g     6

Another alternative is to use query:

qry = " & ".join('{0} {1} {2}'.format(k, cond[0], cond[1]) for k, cond in d.iteritems())

>>> qry
"var4 > 3 & var2 == 'b'">>> df.query(qry)
   var1 var2 var3  var4
445    b    f     4545    b    g     5645    b    g     6

Solution 2:

You could create mask for each condition using list comprehension and then join them by converting to dataframe and using all:

In [23]: pd.DataFrame([tempDF[key] == val forkey, val in tempDict.items()]).T.all(axis=1)
Out[23]:
0False1False2True3True4False5False6False7False8False9Falsedtype: bool

Then you could slice your dataframe with that mask:

mask = pd.DataFrame([tempDF[key] == valfor key, valin tempDict.items()]).T.all(axis=1)

In [25]: tempDF[mask]
Out[25]:
   var1 var2 var3  var4
212    b    f     3312    b    f     3

Solution 3:

Here's one way to build up conditions from tempDict

In [25]: tempDF.loc[pd.np.all([tempDF[k] == tempDict[k] for k in tempDict], axis=0), :]
Out[25]:
   var1 var2 var3  var4
212    b    f     3312    b    f     3

Or use query for more readable query-like string.

In [33]: tempDF.query(' & '.join(['{0}=={1}'.format(k, repr(v)) for k, v in tempDict.iteritems()]))
Out[33]:
   var1 var2 var3  var4
212    b    f     3312    b    f     3

In [34]: ' & '.join(['{0}=={1}'.format(k, repr(v)) for k, v in tempDict.iteritems()])
Out[34]: "var4==3 & var2=='b'"

Post a Comment for "Selecting Data From Pandas Dataframe Based On Criteria Stored In A Dict"