Skip to content Skip to sidebar Skip to footer

Pandas How To Derived Values For A New Column Base On Another Column

I have a dataframe that has a column that each value is a list, now I want to derive a new column which only considers list whose size is greater than 1, and assigns a unique integ

Solution 1:

We can use np.random.choice for unique random values with .loc for assignment i.e

df = pd.DataFrame({'document_no_list' :[[1,2,3],[4,5,6,7],[8],[9,10]]})

x = df['document_no_list'].apply(len) > 1 

df.loc[x,'Cluster'] =  np.random.choice(range(len(df)),x.sum(),replace=False)

Output :

 document_no_list  Cluster
0        [1, 2, 3]      2.0
1     [4, 5, 6, 7]      1.0
2              [8]      NaN
3          [9, 10]      3.0

If you want continuous numbers then you can use

df.loc[x,'Cluster'] =  np.arange(x.sum())+1
 document_no_list  Cluster
0        [1, 2, 3]      1.0
1     [4, 5, 6, 7]      2.0
2              [8]      NaN
3          [9, 10]      3.0

Hope it helps

Solution 2:

Create a boolean column based on condition and apply cumsum() on rows with 1's

df['cluster_id'] = df['document_no_list'].apply(lambda x: len(x)> 1).astype(int)

df.loc[df['cluster_id'] == 1, 'cluster_id'] = df.loc[df['cluster_id'] == 1, 'cluster_id'].cumsum()


document_no_list    cluster_id
0   [1, 2, 3]       1
1   [4, 5, 6, 7]    2
2   [8]             0
3   [9, 10]         3

Post a Comment for "Pandas How To Derived Values For A New Column Base On Another Column"