Pandas How To Derived Values For A New Column Base On Another Column
I have a dataframe that has a column that each value is a list, now I want to derive a new column which only considers list whose size is greater than 1, and assigns a unique integ
Solution 1:
We can use np.random.choice for unique random values with .loc for assignment i.e
df = pd.DataFrame({'document_no_list' :[[1,2,3],[4,5,6,7],[8],[9,10]]})
x = df['document_no_list'].apply(len) > 1
df.loc[x,'Cluster'] = np.random.choice(range(len(df)),x.sum(),replace=False)
Output :
document_no_list Cluster 0 [1, 2, 3] 2.0 1 [4, 5, 6, 7] 1.0 2 [8] NaN 3 [9, 10] 3.0
If you want continuous numbers then you can use
df.loc[x,'Cluster'] = np.arange(x.sum())+1
document_no_list Cluster 0 [1, 2, 3] 1.0 1 [4, 5, 6, 7] 2.0 2 [8] NaN 3 [9, 10] 3.0
Hope it helps
Solution 2:
Create a boolean column based on condition and apply cumsum() on rows with 1's
df['cluster_id'] = df['document_no_list'].apply(lambda x: len(x)> 1).astype(int)
df.loc[df['cluster_id'] == 1, 'cluster_id'] = df.loc[df['cluster_id'] == 1, 'cluster_id'].cumsum()
document_no_list cluster_id
0 [1, 2, 3] 1
1 [4, 5, 6, 7] 2
2 [8] 0
3 [9, 10] 3
Post a Comment for "Pandas How To Derived Values For A New Column Base On Another Column"