Skip to content Skip to sidebar Skip to footer

Modify Function To Return Dataframe With Specified Values

With reference to the test data below and the function I use to identify values within variable thresh of each other. Can anyone please help me modify this to show the desired out

Solution 1:

use mask and sub with axis=1

df2.mask(df2.sub(df2.apply(closeCols2,1),0).abs()> thresh)

    AAA   BBB  CCC  DDD  EEE
0NaNNaN100981031NaNNaN5050502NaN30.025252537.0NaN10101049.011.0101010510.010.0111111

note: I'd redefine closeCols to include thresh as a parameter. Then you could pass it in the apply call.

defcloseCols2(df, thresh):
        max_value = Nonefor k1,k2 in combinations(df.keys(),2):
            ifabs(df[k1] - df[k2]) < thresh:
                if max_value isNone:
                    max_value = max(df[k1],df[k2])
                else:
                    max_value = max(max_value, max(df[k1],df[k2]))
        return max_value 

df2.apply(closeCols2, 1, thresh=5)

extra credit I vectorized and embedded your closeCols for some mind numbing fun. Notice there is no apply

  • numpybroadcasting to get all combinations of columns subtracted from each other.
  • np.abs
  • <= 5
  • sum(-1) I arranged the broadcasting such that the difference of say row 0, column AAA with all of row 0 will be laid out across the last dimension. -1 in the sum(-1) says to sum across last dimension.
  • <= 1 all values are less than 5 away from themselves. So I want the sum of these to be greater than 1. Thus, we mask all less than or equal to one.

v = df2.values
df2.mask((np.abs(v[:,:, None]- v[:, None])<=5).sum(-1)<=1)

    AAA   BBB  CCC  DDD  EEE
0NaNNaN100981031NaNNaN5050502NaN30.025252537.0NaN10101049.011.0101010510.010.0111111

Post a Comment for "Modify Function To Return Dataframe With Specified Values"