Pandas: For Groups Of Rows Where 2 Or More Particular Columns Values Are Exactly The Same, How To Assign A Unique Integer As A New Column
In a Pandas dataframe, I have groups of rows where the values for 2 particular columns are exactly the same. How do I add a new column for those rows, that assigns a unique integer
Solution 1:
Using groupby
with sort=False
and ngroup
df[3] = df.groupby([1,2], sort=False).ngroup()+1
Out[1261]:
0 1 2 3
0 plane1 az 1
1 plane2 az 1
2 plane3 az 2
3 plane4 az 2
4 plane5 ny 3
5 plane6 ny 3
6 plane7 fl fl 4
7 plane8 fl fl 4
8 plane10 de 5
9 plane11 de 5
10 plane12 mo 6
11 plane13 mo 6
Solution 2:
In your case factorize
after convert to tuple
df[3]=pd.factorize(df[[1,2]].apply(tuple,1))[0]+1
df
01230 plane1 az 11 plane2 az 12 plane3 az 23 plane4 az 24 plane5 ny 35 plane6 ny 36 plane7 fl fl 47 plane8 fl fl 48 plane10 de 59 plane11 de 510 plane12 mo 611 plane13 mo 6
Or
pd.factorize(df[[1,2]].replace('',' ').sum(1))[0]+1
Or using category with cat.codes
df[[1,2]].apply(tuple,1).astype(category).cat.codes
And if you just want the unique values you can check with hash
df[[1,2]].apply(tuple,1).apply(hash,1)
Post a Comment for "Pandas: For Groups Of Rows Where 2 Or More Particular Columns Values Are Exactly The Same, How To Assign A Unique Integer As A New Column"