Find Closest Value Pairs And Calculate Mean In Python
I have a dataframe as follows: import pandas as pd import numpy as np import random np.random.seed(5) df = pd.DataFrame(np.random.randint(100, size=(100, 3)), c
Solution 1:
I'd use numpy here:
In [11]: x = df.values
In [12]: x.sort()
In [13]: (x[:, 1:] + x[:, :-1])/2
Out[13]:
array([[69.5, 88.5],
[12. , 44.5],
[28.5, 46. ],
[41.5, 78. ],
[34. , 66.5]])
In [14]: np.diff(x)
Out[14]:
array([[17, 21],
[ 8, 57],
[ 3, 32],
[69, 4],
[38, 27]])
In [15]: np.diff(x).argmin(axis=1)
Out[15]: array([0, 0, 0, 1, 1])
In [16]: ((x[:, 1:] + x[:, :-1])/2)[np.arange(len(x)), np.diff(x).argmin(axis=1)]
Out[16]: array([69.5, 12. , 28.5, 78. , 66.5])
In [17]: df["D"] = ((x[:, 1:] + x[:, :-1])/2)[np.arange(len(x)), np.diff(x).argmin(axis=1)]
Solution 2:
Assuming that you require an additional column D
having the mean of the value pair which has the least difference among the three possible pairs: (colA, colB), (colB, colC) and (colC, colA)
, following code should work:
Updated:
defmeanFunc(row):
nonNanValues = [x for x inlist(row) ifstr(x) != 'nan']
numOfNonNaN = len(nonNanValues)
if(numOfNonNaN == 0): return0if(numOfNonNaN == 1): return nonNanValues[0]
if(numOfNonNaN == 2): return np.mean(nonNanValues)
if(numOfNonNaN == 3):
minDiffPairIndex = np.argmin( [abs(row['A']-row['B']), abs(row['B']-row['C']), abs(row['C']-row['A']) ])
meanDict = {0: np.mean([row['A'], row['B']]), 1: np.mean([row['B'], row['C']]), 2: np.mean([row['C'], row['A']])}
return meanDict[minDiffPairIndex]
df['D'] = df.apply(meanFunc, axis=1)
Above code handles the NaN
values in rows in the way that if all three values are NaN
then column D
has value 0
, if two values are NaN
then non-NaN value is assigned to column D
and if there exists exactly one NaN
then the mean of remaining two is assigned to column D
.
Previous:
def meanFunc(row):
minDiffPairIndex = np.argmin( [abs(row['A']-row['B']), abs(row['B']-row['C']), abs(row['C']-row['A']) ])
meanDict = {0: np.mean([row['A'], row['B']]), 1: np.mean([row['B'], row['C']]), 2: np.mean([row['C'], row['A']])}
return meanDict[minDiffPairIndex]
df['D'] = df.apply(meanFunc, axis=1)
Hope I understood your question correctly.
Solution 3:
This may not be the fastest way of doing this but it's very straightforward.
def func(x):
a,b,c = x
diffs = np.abs(np.array([a-b,a-c,b-c]))
means = np.array([(a+b)/2,(a+c)/2,(b+c)/2])
return means[diffs.argmin()]
df["D"] = df.apply(func,axis=1)
df.head()
Post a Comment for "Find Closest Value Pairs And Calculate Mean In Python"