Looking To Transform Continuous Variables Into Categorical
Sample Data: id val1 val2 val3 val4 val5 val6 val7 ///+8yr NaN 0.0 2.0 NaN 1 3 23 ///1vjh NaN NaN NaN NaN NaN 7 62 ///4wu 3
Solution 1:
IIUC you have two questions. The first question of replacing values larger than 5 with 'larger than 5'
can be achieved with boolean indexing and the second question of grouping can be achieved with pd.cut()
DEMO:
d = pd.read_clipboard()
Part 1
Obtaining the values that does not satisfy the larger than 5 criteria,
rest = d.loc[:,'val1':'val6'][~(d.loc[:,'val1':'val6'] >5)]
rest
val1 val2 val3 val4 val5 val6
0 NaN 0.0 2.0 NaN 1.0 3.0
1 NaN NaN NaN NaN NaN NaN
2 3.0 NaN NaN NaN NaN NaN
Obtaining the larger than 5 values
larger_than_5=d.loc[:,'val1':'val6'][d.loc[:,'val1':'val6'] >5]
print(larger_than_5)
val1 val2 val3 val4 val5 val6
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN 7.0
2 NaN NaN 6.0 NaN 7.0 8.0
Updating with your logic,
larger_than_5[larger_than_5.notnull()] ='Larger than 5'
print(larger_than_5)
val1 val2 val3 val4 val5 val6
0 NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN Larger than 5
2 NaN NaN Larger than 5 NaN Larger than 5 Larger than 5
Updating rest
with the logic,
rest.update(larger_than_5)
print(rest)
val1 val2 val3 val4 val5 val6
0 NaN 0.0 2 NaN 1 3
1 NaN NaN NaN NaN NaN Larger than 5
2 3.0 NaN Larger than 5 NaN Larger than 5 Larger than 5
Replacing values of the original df with updated values as per logic 1
d.loc[:,'val1':'val6'] = rest
print(d)
id val1 val2 val3 val4 val5 val6 \
0 ///+8yr NaN 0.0 2 NaN 1 3
1 ///1vjh NaN NaN NaN NaN NaN Larger than 5
2 ///4wu 3.0 NaN Larger than 5 NaN Larger than 5 Larger than 5
val7
0 23
1 62
2 180
Part 2
Obtaining bins
bins = np.arange(0, d['val7'].max()+1, 30)
bins
array([ 0, 30, 60, 90, 120, 150, 180], dtype=int64)
Creating a new series
val7_groups = pd.cut(d['val7'], bins)
val7_groups
0 (0, 30]
1 (60, 90]
2 (150, 180]
Adding that to the dataframe
d['val7_groups'] = val7_groups
print(d)
id val1 val2 val3 val4 val5 val6 \
0 ///+8yr NaN 0.0 2 NaN 1 3
1 ///1vjh NaN NaN NaN NaN NaN Larger than 5
2 ///4wu 3.0 NaN Larger than 5 NaN Larger than 5 Larger than 5
val7 val7_groups
0 23 (0, 30]
1 62 (60, 90]
2 180 (150, 180]
you can also set group labels by passing values to the labels parameter in pd.cut()
Post a Comment for "Looking To Transform Continuous Variables Into Categorical"